Leap is one of the fastest-growing benefits solutions and a category-defining pioneer in employer specialty pharmacy. The Senior Data Engineer will own the data pipelines, warehouse, and reporting layer, ensuring reliable infrastructure for the company while collaborating with various teams.
Responsibilities:
- Build and own data pipelines and ETL for claims ingestion, drug pricing, and CRM sync (BigQuery, Python)
- Design warehouse schemas and transforms that the rest of the company depends on
- Maintain data quality and reliability across systems that feed both human users and AI workloads — this means row-count checks, schema drift detection, anomaly alerting, and knowing when upstream sources have silently changed, not just whether the job ran
- Build reporting systems that give sales, clinical, and leadership teams live visibility into the business
- Create automated alerting that surfaces when something has changed, so the team acts on data instead of asking for it
- Build PHI-safe pipelines that feed LLM workloads, agent systems, and automation
- Design data architecture that connects claims, drug pricing, patient records, CRM activity, and clinical workflows into a usable whole
- Own the ingestion of external data from non-standard formats and sources — we work with many providers who each send data differently, and new sources are added regularly
Requirements:
- Python, SQL, and dbt
- You've worked with BigQuery, Snowflake, or a similar cloud warehouse
- Know your way around orchestration tools (Airflow, Dagster, Prefect, or similar)
- You've built pipelines that other people depend on
- Your schemas are clean and your data models are well-documented
- You use AI tools in your own work
- You know how to build data infrastructure that AI systems can rely on in production
- You've been an early employee, a solo data person, or the one who built the data stack from scratch
- Healthcare or HIPAA experience
- Fivetran or similar ingestion tools
- CRM integrations (Salesforce, HubSpot)
- Experience building data infrastructure for LLM/AI workloads
- Comfort with cloud infrastructure (GCP, AWS) or Linux/sysadmin fundamentals
- You can debug a VM, read logs, and manage services, not just write SQL
- A bias toward simple, cost-effective solutions
- You reach for open-source first and know when a managed service is worth the price and lock-in