Fuel Cycle empowers leading organizations with agile research solutions that deliver decision-ready insights. They are seeking a Senior Data Engineer to design and build the Databricks-first data lake and pipeline infrastructure essential for future AI products.
Responsibilities:
- A multi-tenant data lake and warehouse that unifies all research data — surveys, qualitative feedback, CRM, media transcripts, and more — in a structured, AI-consumable format
- Tenant-isolated data architecture where enterprise client data is structurally separated at the storage and query layer
- Provenance-aware data models where every data point carries full traceability back to its source
- Batch ingestion pipelines that migrate and continuously sync data from existing relational databases and cloud storage into the new lake architecture
- A nightly profile enrichment pipeline that rebuilds living user profiles from all data sources within each client account
- Data access layers serving AI agents via MCP, qualitative search via RAG pipelines, statistical computation tools, REST APIs, and bulk export
Requirements:
- 5+ years of deep, hands-on experience building production lakehouses on Databricks
- You write clean PySpark and Python, model data thoughtfully, and know how to build for a multi-tenant SaaS environment
- Deep production experience across the Databricks platform including Unity Catalog, Delta Live Tables, Databricks SQL, and Workflows
- Delta Lake as a production table format — ACID transactions, schema evolution, performance optimization, and multi-tenant governance via Unity Catalog
- Experience building and maintaining dbt transformation projects using the Databricks adapter in a production environment
- PySpark for large-scale data transformation and batch pipeline authoring
- Strong understanding of batch ingestion pipeline design — migrating from relational sources like MySQL and PostgreSQL into a lakehouse architecture
- Experience with a modern pipeline orchestrator such as Dagster, Prefect, or Databricks Workflows; Dagster experience is a strong positive
- Familiarity with vector databases, embedding pipelines, and RAG patterns for AI workloads — using tools such as Databricks Vector Search, pgvector, or Amazon OpenSearch
- Exposure to AI agent and LLM-serving infrastructure including Amazon Bedrock, AgentCore, and Strands
- Experience with data cataloging and governance tools such as Unity Catalog or OpenMetadata
- Data modeling for multi-tenant analytical workloads — partitioning strategy, schema design, and tenant isolation patterns
- Databricks on AWS — workspace configuration, S3 integration, IAM, and cost governance
- Infrastructure as code using Databricks Asset Bundles or Terraform
- Strong Python and SQL skills
- Databricks certifications — Data Engineer Associate or Professional
- Salesforce or CRM data integration experience
- Prior experience in a multi-tenant SaaS environment with strict data isolation requirements
- Experience migrating from OLTP to a lakehouse architecture
- Candidates with experience in AWS-native data services are strongly valued. Engineers who understand both Databricks and AWS-native approaches bring a broader architectural perspective that helps the team make better long-term platform decisions
- Apache Iceberg, AWS Glue, Athena, and DynamoDB experience