Qualified Health is redefining possibilities with Generative AI in healthcare, focusing on safe AI governance and real-time algorithm monitoring. They are seeking a Lead Data Engineer to develop clinical intelligence layers, design data transformation pipelines, and collaborate with clinical SMEs to enhance AI-driven clinical insights.
Responsibilities:
- Design and build clinical annotation pipelines that extract conditions, medications, and procedures from unstructured clinical notes
- Implement negation and temporal detection to distinguish current conditions from historical findings (critical for clinical decision-making)
- Build business rules engines that classify medications, calculate risk scores, and apply clinical logic at scale
- Integrate clinical reference data (drug databases, terminology mappings) into transformation pipelines
- Optimize data structures to reduce LLM processing time and improve downstream AI performance
- Build production-grade pipelines using PySpark and Databricks for large-scale clinical data processing
- Implement data quality frameworks to validate clinical transformations and catch issues before they reach AI workflows
- Design feature stores that serve pre-computed clinical features to ML models and LLM applications
- Maintain pipeline observability with monitoring, alerting, and performance tracking
- Partner with clinical SMEs to translate medical knowledge into data transformation logic
- Define data contracts with AI team to ensure feature outputs meet LLM workflow requirements
- Contribute to technical standards and best practices for clinical data engineering
Requirements:
- 8+ years of data engineering experience, with demonstrated expertise building production data pipelines
- 5+ years on Databricks, including PySpark, Delta Lake, and Unity Catalog
- Healthcare data experience: Prior work with FHIR APIs, EHR databases, or claims data
- Clinical text processing experience: Built pipelines that extract entities from unstructured clinical notes using tools like spaCy, medspaCy, or cloud NLP services
- Feature engineering for ML/AI: Experience preparing data for machine learning models or LLM consumption
- Data quality mindset: Track record implementing validation frameworks and monitoring for data pipelines
- Healthcare terminology: Familiarity with ICD-10, RxNorm, SNOMED CT, LOINC
- Epic Clarity experience: Direct work with Epic's relational database structure
- Azure cloud platform: Hands-on with Azure Databricks, Data Lake Storage, Service Bus
- Clinical NLP tools: Experience with Azure Text Analytics for Health, Amazon Comprehend Medical, or similar
- RAG architecture patterns: Understanding of vector databases and retrieval-augmented generation