Sentara Health is hiring a Data Engineer to support a modern data platform built on Databricks. The role focuses on building scalable data pipelines, ensuring data quality and governance, and collaborating with stakeholders on data onboarding and requirements.
Responsibilities:
- Develop and maintain data pipelines using PySpark and Databricks
- Work within a metadata-driven ingestion framework to onboard new datasets
- Implement data quality checks and validation rules within pipelines
- Support ingestion from file-based sources and ingestion tools (e.g., Fivetran)
- Handle schema changes, incremental loads, and file processing patterns
- Contribute to data governance practices including tagging, metadata, and lineage
- Troubleshoot and resolve pipeline failures and performance issues
- Collaborate with architects and stakeholders on data onboarding and requirements
- Follow and contribute to coding standards, reusable components, and best practices
Requirements:
- Experience in lieu of a Bachelor's Degree
- 3+ years of relevant experience with a degree
- 5+ years of relevant experience without a degree
- Required to 3 to 5 years of relevant experience
- Hands-on experience with PySpark and Databricks
- Strong SQL skills
- Experience building ETL/ELT data pipelines
- Understanding of Delta Lake concepts (merge, schema evolution, partitions)
- Familiarity with cloud platforms (Azure preferred)
- Basic experience with Git and version control
- Exposure to data catalog or governance tools (e.g., DataHub)
- Experience with Fivetran or similar ingestion tools
- Understanding of data quality and validation concepts
- Experience working with metadata-driven frameworks
- Strong problem-solving and debugging skills
- Ability to work in a structured, framework-driven environment
- Focus on data quality, not just pipeline execution
- Willingness to learn and adapt in a fast-evolving data ecosystem