Fuel Cycle is a market research disruptor that empowers organizations with agile research solutions. They are seeking a Senior Data Engineer to design and build a Databricks-first data lake and pipeline infrastructure that will support future AI products.
Responsibilities:
- A multi-tenant data lake and warehouse that unifies all research data — surveys, qualitative feedback, CRM, media transcripts, and more — in a structured, AI-consumable format
- Tenant-isolated data architecture where enterprise client data is structurally separated at the storage and query layer
- Provenance-aware data models where every data point carries full traceability back to its source
- Batch ingestion pipelines that migrate and continuously sync data from existing relational databases and cloud storage into the new lake architecture
- A nightly profile enrichment pipeline that rebuilds living user profiles from all data sources within each client account
- Data access layers serving AI agents via MCP, qualitative search via RAG pipelines, statistical computation tools, REST APIs, and bulk export
- Quickly integrates with the engineering team and contributes meaningfully to the data platform build
- Takes ownership of assigned pipeline and infrastructure work end-to-end, from design through production
- Brings architectural recommendations and solutions proactively, rather than waiting for direction
- Demonstrates strong collaboration and communication across engineering and product teams
Requirements:
- 5+ years of deep, hands-on experience building production lakehouses on Databricks
- You write clean PySpark and Python, model data thoughtfully, and know how to build for a multi-tenant SaaS environment
- Deep production experience across the Databricks platform including Unity Catalog, Delta Live Tables, Databricks SQL, and Workflows
- Delta Lake as a production table format — ACID transactions, schema evolution, performance optimization, and multi-tenant governance via Unity Catalog
- Experience building and maintaining dbt transformation projects using the Databricks adapter in a production environment
- PySpark for large-scale data transformation and batch pipeline authoring
- Strong understanding of batch ingestion pipeline design — migrating from relational sources like MySQL and PostgreSQL into a lakehouse architecture
- Experience with a modern pipeline orchestrator such as Dagster, Prefect, or Databricks Workflows; Dagster experience is a strong positive
- Familiarity with vector databases, embedding pipelines, and RAG patterns for AI workloads — using tools such as Databricks Vector Search, pgvector, or Amazon OpenSearch
- Exposure to AI agent and LLM-serving infrastructure including Amazon Bedrock, AgentCore, and Strands
- Experience with data cataloging and governance tools such as Unity Catalog or OpenMetadata
- Data modeling for multi-tenant analytical workloads — partitioning strategy, schema design, and tenant isolation patterns
- Databricks on AWS — workspace configuration, S3 integration, IAM, and cost governance
- Infrastructure as code using Databricks Asset Bundles or Terraform
- Strong Python and SQL skills
- Proactive Ownership: You bring recommendations and solutions to your manager — you don't wait to be told what to do
- Architectural Judgment: You have the judgment to make the right foundational decisions and defend them
- Greenfield Builder: You thrive on greenfield builds and take full ownership from design through to production
- Comfort with Ambiguity: You are comfortable with ambiguity and can translate high-level vision into a concrete engineering plan
- Outsized Impact: You understand that on a small team your decisions have outsized and lasting impact
- Databricks certifications — Data Engineer Associate or Professional
- Salesforce or CRM data integration experience
- Prior experience in a multi-tenant SaaS environment with strict data isolation requirements
- Experience migrating from OLTP to a lakehouse architecture
- Candidates with experience in AWS-native data services are strongly valued. Engineers who understand both Databricks and AWS-native approaches bring a broader architectural perspective that helps the team make better long-term platform decisions
- Apache Iceberg, AWS Glue, Athena, and DynamoDB experience