Gentiva is a company focused on transforming the delivery of data-driven insights in healthcare. The Databricks Data Engineer will design and engineer robust data pipelines, optimize performance, and ensure data accuracy and security.
Responsibilities:
- Translate business requirements into technical specifications and document solution designs, data flows and architecture
- Design, develop, and maintain ETL/ELT pipelines using Azure Data Factory, Databricks and Apache Spark
- Implement Delta Lake architecture for reliable data storage and processing
- Build and optimize data workflows using Databricks Workflows and Jobs
- Develop scalable data models following medallion architecture (bronze, silver, gold layers)
- Implement Unity Catalog for data governance, access control, and metadata management
- Create and maintain Databricks notebooks for data transformation and analysis
- Optimize Spark jobs for performance and cost efficiency
- Implement data quality checks and validation frameworks
- Collaborate with BI developers, data analysts, and data scientists
- Design and implement data orchestration workflows using Azure Data Factory to coordinate complex ETL/ELT processes
- Develop and maintain CI/CD pipelines for data workflows
- Monitor data pipeline performance and troubleshoot issues
- Document data processes, architectures, and best practices
- Ensure compliance with data security and privacy regulations
- Provide support for new and existing solutions
Requirements:
- Bachelor's degree in Computer Science, Information Technology or related field
- 5+ years of progressive experience in data engineering, analytics, or software development
- 3+ years of hands-on experience with Databricks platform
- Strong experience with Apache Spark and PySpark
- Excellent problem-solving and analytical skills
- Strong oral and written communication abilities
- Self-motivated with ability to adapt to new technologies quickly
- Team player with ability to work independently
- Detail-oriented with strong organizational skills
- Ability to manage multiple priorities and meet deadlines
- Experience communicating technical concepts to non-technical stakeholders
- Expert-level knowledge of Databricks Workspace, clusters, and notebooks
- Delta Lake implementation and optimization
- Unity Catalog for data governance and cataloging
- Databricks SQL and SQL Analytics
- Databricks Workflows, Delta Live Tables, and job orchestration
- Delta Live Tables (DLT) for pipeline orchestration and data quality
- Advanced Python programming (PySpark, pandas, NumPy)
- Advanced SQL (query optimization, performance tuning)
- Git version control and collaborative development
- Azure Databricks
- Cloud storage services (ADLS Gen2, Azure Blob Storage)
- Azure Data Factory for pipeline orchestration and integration
- Experience designing and managing Azure Data Factory pipelines, triggers, and linked services
- Infrastructure as Code (Terraform)
- Experience with BI tools (Power BI, SSRS)
- Data warehousing and data modeling concepts
- SQL Server, including SSIS (Integration Services)
- Scala programming
- Healthcare IT or healthcare data experience
- Databricks Certified Data Engineer Associate (strongly preferred)
- Databricks Certified Data Engineer Professional
- Databricks Lakehouse Fundamentals
- Azure Data Engineer Associate (DP-203)
- Apache Spark certifications
- Experience with complex data modeling including dimensional modeling, star/snowflake schemas
- Experience with medallion architecture (bronze/silver/gold layers)
- Data quality and validation framework implementation
- CI/CD pipeline development for data workflows (Azure DevOps)
- Performance tuning and cost optimization
- DataOps and DevOps practices