Cedar is a leading healthcare technology company focused on improving the healthcare system through data science and smart product design. The Data Engineer III role involves designing and maintaining scalable ELT/ETL pipelines, modernizing legacy data flows, and collaborating with cross-functional teams to ensure reliable and accurate data delivery.
Responsibilities:
- Design, build, and maintain scalable ELT/ETL pipelines that power core use cases including client billing, financial reporting, product analytics, and data services for downstream teams (Finance, Data Science, Commercial Analytics, Product)
- Modernize legacy data flows by migrating SQL- and Liquibase-based transformations into dbt, with solid testing, documentation and data contracts
- Improve reliability and observability of our data platform by applying best practices in testing, monitoring, alerting and runbook-driven operations for pipelines orchestrated via Airflow (and/or similar tools)
- Model data for usability and performance in Snowflake and other systems, applying sound data modeling patterns (e.g., dimensional models, entity-centric designs) for analytics and operational use cases
- Collaborate closely with product, finance, analytics and integrations teams to understand requirements, define interfaces, and ensure data is accurate, well-documented, and delivered in the right form and cadence for consumers
- Contribute to Cedar’s data platform vision by implementing standards for governance, metadata and access, and by helping pilot tools like OpenMetadata and data quality frameworks within your projects
- Participate in code reviews and design discussions, helping to raise the bar on code quality, reliability, and operational excellence across the team
Requirements:
- 3+ years of hands-on data engineering (or closely related software engineering) experience, including building and supporting production data pipelines
- Strong SQL and Python proficiency, with experience implementing data transformations, utilities and tooling (e.g., dbt models, Airflow DAGs, internal scripts)
- Experience with modern data stack tools, including some combination of: Snowflake (or similar cloud data warehouse), dbt, Airflow/Dagster (or similar orchestrator)
- Comfort designing and operating reliable pipelines, including applying testing strategies (unit/integration/dbt tests), basic monitoring and alerting, and contributing to incident/root-cause analysis
- Experience with data modeling and schema design for analytics and reporting use cases (e.g., star/snowflake schemas, event or entity-centric designs)
- Familiarity with cloud platforms, ideally AWS (e.g., S3, IAM, containerized workloads, or related infrastructure supporting data workloads)
- Strong collaboration and communication skills, with the ability to break down ambiguous business problems into clear technical tasks and work effectively with partners across engineering, product and business teams
- Bias to learn and take ownership in a complex, evolving environment—comfortable asking questions, making trade-offs explicit, and driving your work to completion
- Experience with metadata and data governance tools, such as OpenMetadata, DataHub or similar catalogs, and implementing data contracts or quality frameworks (e.g., Great Expectations, dbt tests)
- Exposure to streaming and event-driven data pipelines (e.g., Kafka, CDC tools) and integrating those into warehouse-centric architectures
- Prior experience in healthcare, fintech, or other highly regulated domains, particularly with standards like HL7 or FHIR, or with complex billing/financial data flows
- Familiarity with analytics and visualization tools (e.g., Looker, Hex) and enabling self-serve analytics through well-designed semantic layers and models
- Experience contributing to team-level standards, patterns, and roadmaps for data engineering or platform teams (even if not as primary owner)