Ilant Health is a company focused on leveraging data to enhance clinical precision and business strategy in healthcare. They are seeking a Lead Data Engineer to architect a comprehensive data platform that supports their value-based care models, including overseeing data ingestion, ensuring data quality, and collaborating with data science and product teams.
Responsibilities:
- Design the "Single Patient View": Architect a unified data model that stitches together fragmented data sources (example: linking a pharmacy claim for Wegovy, a clinical lab result for HbA1c, and user engagement metrics from the Ilant app into a cohesive longitudinal record)
- Scalability Planning: Design a cloud-native infrastructure (likely Snowflake/AWS) capable of handling 100x Member growth without requiring a total refactor
- Conversational Intelligence Layer (GenAI/LLM): Architect and implement a "Text-to-Data" interface (leveraging LLMs/RAG) that allows business decision-makers to interact with our data via prompts (e.g., similar to Gemini/ChatGPT)
- Data Consumption Layer: Ensure the reliability and low-latency availability of the data assets (dbt models, feature stores) consumed by the Data Science and Analytics teams, guaranteeing they always have fresh, trustworthy data for modeling and reporting
- External Data Integration: Own the end-to-end reliability of mission-critical external files. You are responsible for the system that ingests, validates, and standardizes these files from payers/employers
- Claims Ingestion Engine: Build robust, fault-tolerant pipelines to handle the notoriously messy formats of payer data (EDI 837/835, raw CSVs, JSON) and standardize them into a clean, queryable schema
- Dbt Model Ownership: Oversee the transformation layer (using dbt), creating a "Gold" layer of data that is business-ready for analysts, product features, and the conversational AI layer
- Pipeline Reliability & Operational Uptime: You own the "uptime" of our data platform. Ensure all scheduled ingestion and transformation jobs run successfully and on time. You are the first line of defense when a pipeline fails, leading the root cause analysis (RCA) and resolution to minimize downtime
- Automated Testing & Observability: Implement "Data Observability" tools (e.g., Great Expectations, Monte Carlo, or custom equivalents) to catch issues before they hit the dashboard (example: Configure alerts to trigger if an eligibility file arrives with 50% fewer records than the previous month)
- Partner with Data Science: Collaborate to productionize predictive models (e.g., patient risk stratification, weight loss trajectory). You will build the MLOps infrastructure that takes a model from a Jupyter notebook to a scalable, real-time inference API within our product
- Partner with Product: Work directly with the CPO and Product Managers to assess the technical feasibility of new features (e.g., "Can we accurately calculate 'time to goal weight' given the current data latency?")
Requirements:
- 7+ years in Data Engineering, with at least 3+ years in a Lead or Architectural role
- Demonstrated ability to make high-stakes 'Buy vs. Build' decisions and architect systems for 10x scale, prioritizing long-term stability and maintainability over short-term patches
- Practical experience or strong interest in building semantic layers for LLM applications (RAG, Vector DBs, or prompt engineering for analytics)
- Languages: Python (Advanced), SQL (Expert)
- Cloud: AWS
- Warehousing: Snowflake, BigQuery, or Databricks
- Transformation: dbt (Data Build Tool)
- Orchestration: Airflow, Dagster, or Prefect