Qualified Health is a company dedicated to transforming healthcare through innovative data solutions. They are seeking a Data Integration Engineer to design and build data pipelines that convert raw healthcare data into production-ready datasets for their AI platform, collaborating closely with a Data Integration Manager.
Responsibilities:
- Design and build ETL pipelines using PySpark, SQL, and Azure data services to process healthcare data from multiple source systems
- Execute data extraction and transformation operations on complex healthcare datasets, ensuring accuracy and compliance with established standards
- Develop data quality validation frameworks to identify and resolve issues during integration, QC, and backtesting phases
- Troubleshoot technical issues including data schema mismatches, transformation logic errors, and performance bottlenecks
- Build reusable data components and standardized integration patterns that accelerate future implementations
- Optimize pipeline performance for large-scale healthcare datasets, ensuring efficient processing and resource utilization
- Implement data validation rules specific to healthcare contexts (e.g., clinical code validation, temporal logic checks, referential integrity)
- Write and maintain technical documentation for data pipelines, transformations, and integration patterns
- Support production deployments by coordinating with infrastructure teams and conducting final testing
- Partner with Data Integration Manager to translate partner requirements into technical specifications
- Participate in technical discussions with partner IT teams to understand data schemas, access methods, and integration constraints
- Provide technical guidance on data mapping specifications and transformation approaches
- Identify data quality issues and work with Manager to coordinate resolution with partners
- Share technical findings from QC and backtesting with Manager to inform partner conversations
- Contribute to continuous improvement of tools, processes, and technical standards
Requirements:
- 5+ years of experience in data analytics, data engineering, or solution delivery roles, with demonstrated expertise in data integration and ETL processes
- Strong analytical toolkit with proficiency in: PySpark for distributed data processing, Advanced SQL for data querying and transformation, Excel for data analysis and reporting
- Production ETL experience: Track record of building and maintaining production-grade data pipelines with proper error handling and monitoring
- Data quality focus: Experience implementing validation frameworks and troubleshooting data quality issues
- Healthcare data experience: Prior work with healthcare datasets (EHR, claims, clinical, lab data)
- Problem-solving mindset: Ability to independently diagnose and resolve complex technical issues
- Attention to detail: Commitment to accuracy, testing, and delivering reliable solutions
- Collaborative working style: Comfortable partnering with non-technical colleagues and adapting to feedback
- Bachelor's degree in Computer Science, Engineering, Data Science, Mathematics, or related technical field
- Epic Clarity experience: Direct work with Epic's relational database structure and clinical data models
- Healthcare data standards knowledge: Understanding of FHIR, HL7v2, DICOM, LOINC, SNOMED, ICD-10
- Azure cloud platform: Hands-on experience with Azure Databricks, Data Factory, Blob Storage, Delta Lake
- Healthcare compliance awareness: Understanding of HIPAA requirements and healthcare data security best practices
- Data warehouse/lakehouse experience: Familiarity with dimensional modeling and modern data architecture patterns
- DevOps practices: Experience with Git, CI/CD pipelines, and infrastructure-as-code
- Performance tuning: Proven ability to optimize complex data transformations for scale
- LIMS/PACS experience: Prior work integrating laboratory or imaging systems data
- Multiple data format fluency: Experience with JSON, XML, Parquet, CSV, and other healthcare interchange formats