Sentara Health is hiring a Data Engineer to support a modern data platform built on Databricks. This fully remote role will focus on building scalable data pipelines, ensuring data quality and governance, and collaborating with architects and stakeholders on data onboarding and requirements.
Responsibilities:
- Develop and maintain data pipelines using PySpark and Databricks
- Work within a metadata-driven ingestion framework to onboard new datasets
- Implement data quality checks and validation rules within pipelines
- Support ingestion from file-based sources and ingestion tools (e.g., Fivetran)
- Handle schema changes, incremental loads, and file processing patterns
- Contribute to data governance practices including tagging, metadata, and lineage
- Troubleshoot and resolve pipeline failures and performance issues
- Collaborate with architects and stakeholders on data onboarding and requirements
- Follow and contribute to coding standards, reusable components, and best practices
Requirements:
- Experience in lieu of a Bachelor's Degree
- 3+ years of relevant experience with a degree
- 5+ years of relevant experience without a degree
- Required to 3 to 5 years of relevant experience
- Hands-on experience with PySpark and Databricks
- Strong SQL skills
- Experience building ETL/ELT data pipelines
- Understanding of Delta Lake concepts (merge, schema evolution, partitions)
- Familiarity with cloud platforms (Azure preferred)
- Basic experience with Git and version control
- Exposure to data catalog or governance tools (e.g., DataHub)
- Experience with Fivetran or similar ingestion tools
- Understanding of data quality and validation concepts
- Experience working with metadata-driven frameworks
- Strong problem-solving and debugging skills
- Ability to work in a structured, framework-driven environment
- Focus on data quality, not just pipeline execution
- Willingness to learn and adapt in a fast-evolving data ecosystem