Medeloop is a technology company focused on transforming healthcare through AI. They are seeking a Senior AI Data Engineer to architect the data backbone for their AI platform, working closely with data scientists and product teams to manage and enhance healthcare data workflows.
Responsibilities:
- The healthcare data lake: curating, extending, and evolving it through new concepts, derived variables, and data models that directly inform our AI engines and customer products
- AI-native data workflows: designing and operating AI-powered pipelines (using tools like Claude Code and agent frameworks) to automate harmonization, cleaning, quality checks, and summarization at scale
- NLP and semantic infrastructure: building pipelines for entity extraction, concept normalization, embedding-based retrieval, and semantic search that power the AI Scientist platform
- Novel data extraction approaches: experimenting with and building new methodologies for working with unstructured clinical data, not just applying existing playbooks
- Research-grade data products: delivering analytical samples, cohorts, and final datasets that withstand scientific scrutiny and are actively used by researchers and customers
- Data governance and observability protocols: including access controls, PHI/PII handling, data classification, compliance, monitoring, alerting, data freshness, and comprehensive documentation to enable self-service capabilities
Requirements:
- 3+ years of relevant data engineering or data management within an analytics-driven organization, with end-to-end ownership from raw ingestion to final data product
- Deep hands-on experience with healthcare CDMs (OMOP, FHIR, PCORnet) — designing or extending them, not just querying
- Knowledge of medical ontologies: UMLS, SNOMED CT, RxNorm
- Experience with big data, data pipelines and tooling that support retrieval-augmented generation (RAG), vector integrations, embedding workflows, and other AI/ML workloads. Experience in big data tooling such as Spark, Iceberg, EMR
- Fluent in Python and SQL; comfortable across structured and unstructured data
- Proven NLP experience: semantic search, entity recognition, concept normalization, embedding pipelines
- Strong grasp of inferential statistics and cohort methodology to be a real partner to data scientists and customers (as part of onboarding)
- Experience contributing to an AI/ML product, especially in automated research or scientific discovery
- Experience mentoring other engineers and providing technical leadership
- Multi-cloud experience (AWS, Azure, GCP)
- Authorship or contribution to peer-reviewed publications or technical reports