Sentara Health is looking for a Remote Lead Databricks Engineer to help build and scale a modern, metadata-driven data platform on Azure. This role involves designing reusable data ingestion frameworks, enabling governance at scale, and integrating with modern data catalog and access patterns.
Responsibilities:
- Design and develop scalable, reusable data ingestion frameworks using Azure Databricks and PySpark
- Build metadata-driven pipelines that decouple configuration from execution
- Implement and maintain Bronze, Silver, and Gold data layers following Medallion architecture
- Integrate data pipelines with data catalog and governance tools (e.g., DataHub, Purview)
- Implement schema enforcement, data contracts, and data quality checks within pipelines
- Work with ingestion tools (e.g., Fivetran) and file-based ingestion patterns
- Enable automation for dataset onboarding using configuration-driven approaches
- Collaborate with architecture, governance, and business teams to ensure scalable and governed data access
- Contribute to evolving toward event-driven and near real-time ingestion patterns
- Support CI/CD and DevOps practices for data platform components
Requirements:
- 8+ years of experience in data engineering or data platform development with degree
- 10+ years of relevant experience without a degree
- Strong hands-on experience with Azure Databricks and PySpark
- Proficiency in Python for data engineering and framework development
- Experience with Delta Lake and Lakehouse architecture
- Strong understanding of Medallion architecture (Bronze, Silver, Gold)
- Experience designing reusable and scalable data ingestion frameworks
- Experience working with data catalog / governance tools (DataHub, Purview, Collibra, etc.)
- Solid understanding of: data lineage, schema evolution, data contracts, data quality frameworks
- Experience working with cloud storage (ADLS Gen2) and distributed data processing
- Familiarity with CI/CD pipelines and Git-based workflows