Vaiticka Solution is seeking a Senior Data Engineer with deep healthcare domain expertise to lead the design and delivery of a large-scale OCI-based Healthcare Analytics Data Lake. The role involves integrating clinical and claims data and serving as the primary onshore technical lead for end-to-end development, client coordination, and production deployment.
Responsibilities:
- We are seeking a highly experienced Senior Data Engineer to support the design and development of a large-scale Healthcare Analytics Data Lake that integrates clinical (HL7, FHIR, CCDA) and claims (EDI 837/835) data
- The engineer will work onsite as the primary technical liaison, supporting end-to-end development, collaborating with offshore teams, and coordinating with client stakeholders for QA and deployment activities
- This role requires strong hands-on engineering skills, deep healthcare data expertise, and proficiency with modern cloud-native data architectures, specifically on Oracle Cloud Infrastructure (OCI)
- Data Pipeline Development Design and implement large-scale data ingestion, parsing, and transformation pipelines using Python , Spark , PySpark, and Spark SQL. Build and optimize metadata-driven pipelines for flexible ingestion and transformation
- Process multi-format healthcare data including EDI 837/835, HL7 v2, CCDA, and FHIR bundles
- Cloud-Native Engineering (OCI Preferred) Develop and operate data pipelines using OCI services: OCI Data Integration OCI Data Flow (Spark) OCI Delta Lake OCI Autonomous Database OIC Integration Engine for parsing clinical/claims data Ensure performance tuning, scalability, cost optimization, and production stability
- Data Lake & Medallion Architecture Build Delta Lake/Parquet‑based data lakes following Medallion Architecture (Bronze → Silver → Gold). Implement CDC, schema evolution, data quality checks, and validation frameworks
- Data Modelling & Healthcare Domain Expertise Develop canonical clinical and claims data models aligned to healthcare CDMs. Map and normalize data to industry terminologies such as: LOINC SNOMED CT ICD-9/10 CPT RxNorm
- DevOps, DataOps & Orchestration Implement CI/CD pipelines using Git, Terraform, and automated deployment workflows. Develop orchestrations/workflows with built-in data lineage, auditability, monitoring, and governance. Establish DataOps best practices for automated testing, observability, and metadata management
- Onsite Leadership & Client Coordination Act as the primary onshore engineering lead between offshore teams and client stakeholders
- Facilitate handovers to QA for SIT/UAT, coordinate deployment cycles, and support production readiness. Conduct architecture walkthroughs, design reviews, and requirement mapping sessions
Requirements:
- 10+ years in Data Engineering with strong hands-on development experience
- Expert-level skills in: Python, PySpark / Spark SQL, JSON, XML processing
- Experience with: EDI 837/835, HL7, CCDA, FHIR, Delta Lake, Parquet, schema evolution
- Strong understanding of: Data modelling, Healthcare CDMs, Data governance, lineage, audit frameworks, Metadata-driven architectures, Data pipeline orchestration, Cloud & DevOps
- Hands-on with Git, CI/CD, Terraform, DataOps automation
- Familiarity with healthcare terminology standards: LOINC, SNOMED, ICD, CPT, RxNorm
- Strong communication, client‑facing presence, and ability to work independently onsite
- Ability to coordinate offshore development teams' Excellent documentation and technical leadership capability
- Experience with cloud-native platforms, preferably OCI