Thyme Care is a market-leading value-based oncology care enabler, dedicated to improving the experience for people with cancer. The Data Integration Engineer will be responsible for ensuring reliable data flow from partners and vendors, collaborating with various teams to enhance data ingestion processes, and improving data models and pipelines.
Responsibilities:
- Collaborate closely with our Product Manager and other Data teammates focused on data ingestion and analytics engineering
- Support ingestion of a wide range of healthcare-related sources (claims, eligibility, prior auth, ADT, etc.) by configuring net-new ingestions (parsing file specs, validating assumptions, communicating inconsistencies)
- Debugging issues in ongoing ones
- Helping standardize our processes and pipelines
- Collaborate with data scientist deal owners and internal stakeholders to turn messy, ambiguous requirements into concrete mapping/validation logic and durable data contracts
- Use Dagster and GitHub Actions to orchestrate and automate the early stages of our data pipelines, improving run reliability and reducing manual intervention
- Work hands-on with raw data using Jupyter Notebooks in Databricks to investigate data issues, validate assumptions, and unblock processing
- Design and support incremental data loads (append/merge/upsert patterns) and safe reprocessing (idempotent runs, late-arriving data, backfills)
- Learn to use Datadog and PagerDuty to monitor pipelines, triage incidents during business hours, communicate impact clearly, and drive root-cause fixes to prevent recurrences
- Contribute to a complex, self-hosted dbt monorepo: implement transformations, incremental models, tests, documentation, and conventions that scale across deals
Requirements:
- Strong SQL skills
- Familiarity with dbt (and an interest in ramping up your expertise), including working in larger/complex projects
- Working knowledge of Python for data investigation in notebooks
- Experience operating data pipelines: debugging failures, tracing issues across systems, and communicating clearly about root cause and mitigation
- Experience with testing and data quality: writing and maintaining tests and using failures/alerts to drive durable fixes
- Responsiveness and the ability to stay calm and organized when triaging failing ingestion runs or pipelines
- Willingness to learn new domains and tools quickly (new partner file formats, evolving standards, Databricks), and apply feedback without ego
- The ability to engage technical and non-technical stakeholders to explain what's happening in our pipelines and identify opportunities to improve transparency and alerting
- Nice-to-have: healthcare data exposure (claims/eligibility/ADT/etc.)