Imagine Pediatrics is a tech-enabled, pediatrician-led medical group focused on enhancing care for children with special health care needs. As a Staff Data Engineer, you will define data movement through the platform and manage data pipelines that support clinical analytics and operational reporting, collaborating closely with various engineering teams.
Responsibilities:
- Design, build, and maintain scalable ELT pipelines that ingest data from clinical systems, APIs, and third-party integrations utilizing webhook-based, API-based, and CDC (change data capture) approaches
- Architect and manage event-driven data pipelines in AWS — including cross-account configurations and dead-letter queue handling
- Write and maintain infrastructure-as-code to deploy and manage data ingestion workloads, primarily extending existing modules and patterns
- Orchestrate pipeline execution and monitoring using Dagster, ensuring observability and reliability across all workflows
- Implement data quality checks, alerting, and lineage tracking across the pipeline
- Identify and eliminate systemic failure modes in pipelines, improving reliability through long-term fixes rather than repeated incident remediation
- Partner with Analytics Engineers to ensure upstream data supports correct and consistent downstream models
- Set technical direction for data architecture and mentor other engineers
Requirements:
- 7–10+ years of data engineering or platform engineering experience, including at least 2+ years in a senior or staff-level role owning production data systems
- Strong experience designing data pipelines using Python and SQL
- Strong experience with AWS services including Lambda, SQS, SNS, and S3
- Strong experience building event-driven and API-based ingestion systems (e.g., webhooks, asynchronous processing, or CDC patterns)
- Experience with data orchestration tools such as Dagster (or similar)
- Experience working with infrastructure-as-code (Terraform), primarily extending and adapting existing modules and patterns
- Experience with cloud data warehouses, preferably Snowflake, including performance-aware SQL development
- Proficiency in at least one scripting language beyond SQL and Python (JavaScript, TypeScript, or Go) for automation, tooling, or serverless functions
- Demonstrated use of modern software engineering practices including version control, CI/CD, testing, and code review
- Proven ability to troubleshoot complex data and infrastructure issues across multiple systems and clearly communicate findings to both technical and non-technical stakeholders
- Proven ability to reason about downstream analytical impact of data pipeline design, including data freshness, grain, and transformation behavior
- Experience working closely with analytics engineering, data modeling, or similar downstream consumers of data
- Experience designing or managing IAM policies and least-privilege access models across data platform services
- Experience with dbt or modern analytics engineering workflows
- Experience working with healthcare data, FHIR resources, or clinical systems
- Familiarity with HIPAA compliance and handling of PHI in cloud environments
- Experience with high-volume ingestion systems including webhook-based tools (e.g., Hevo, Fivetran, or similar)
- Experience driving the adoption of AI tools to improve engineering productivity
- Exposure to real-world evidence (RWE), health economics and outcomes research (HEOR), or similar evidence-generation programs