Definitive Healthcare is a company focused on transforming data and analytics into actionable intelligence for the healthcare industry. They are seeking a Data Engineer to build scalable data pipelines and manage complex healthcare datasets within a cloud-native architecture.
Responsibilities:
- Develop and maintain robust data pipelines using Python, Spark, Databricks, SQL, and SSIS
- Develop, maintained, and optimized ETL workflows using SQL Server Integration Services (SSIS) within Visual Studio, enabling reliable data ingestion, transformation, and automation across enterprise data pipelines
- Implement and orchestrate ETL/ELT workflows using SSIS
- Build reliable, repeatable processes that support the ingestion and transformation of large healthcare datasets
- Integrate data from diverse sources (AWS, on‑prem, third‑party vendors) into our enterprise data platform
- Work with a wide range of file formats including CSV, XML, Parquet, Delta, and more
- Apply strong data quality, cleansing, and curation practices to ensure accuracy and consistency
- Optimize storage and compute resources for performance, cost, and scalability
- Automate observability and monitoring across data pipelines and workloads
- Implement and manage Unity Catalog for metadata, lineage, and access control
- Ensure adherence to data governance, security, and privacy standards
- Maintain clear documentation, data dictionaries, and lineage tracking
- Contribute to automation of data observability and governance workflows
- Tune and optimize Spark jobs for speed, reliability, and cost efficiency
- Diagnose and resolve performance bottlenecks across distributed systems
- Apply JVM tuning and Spark optimization techniques to improve throughput
- Support and enhance our Medallion architecture (bronze/silver/gold) to improve data quality and usability
- Ensure data is processed, enriched, and validated at each stage of the lifecycle
- Partner with data scientists, analysts, product teams, and business stakeholders to understand data needs
- Implement CI/CD pipelines to streamline deployment and testing of data assets
- Stay current with emerging technologies and bring forward recommendations to evolve our data platform
Requirements:
- Strong programming experience in SQL and Python or Scala
- Hands‑on experience with Apache Spark and Databricks
- Knowledge of data cleansing, curation, and quality frameworks
- Familiarity with Unity Catalog or other metadata management tools
- Understanding of data governance, security, and compliance best practices
- Experience working with AWS cloud services
- Experience implementing or working within a Medallion architecture
- Strong analytical and problem‑solving abilities
- Excellent communication and cross‑functional collaboration skills
- Ability to work independently and within a team environment
- High attention to detail and commitment to quality
- AWS certifications (e.g., AWS Certified Data Analytics)
- Experience with SQL and NoSQL databases
- Background in a fast‑paced, data‑centric SaaS or healthcare environment