Salma Health is reimagining brain healthcare by integrating care delivery, technology innovation, and research breakthroughs. They are seeking a mid-level Data Engineer to build and maintain the data platform that supports their mental and behavioral health practice, focusing on data extraction, transformation, and orchestration.
Responsibilities:
- Maintaining and improving the orchestration layer: Dagster assets, jobs, schedules, sensors, and the dependency graph that ties extraction → loading → transformation together
- Adding new data sources to the pipeline; extracting from APIs (GraphQL, REST), Google Drive folders, and CSV/JSONL drops on S3, then landing them in our bronze schemas via Dagster assets
- Building silver and gold dbt models that transform raw source data into our unified entity model following the medallion architecture
- Extending our semantic layer so business metrics are available to downstream consumers (BI tool dashboards, AI agents, ad-hoc analysis) without re-deriving logic
- Operating the platform on AWS: ECS Fargate services, RDS, S3, Secrets Manager, CloudFormation templates, and the CodePipeline-based CI/CD that deploys our data platform. All of our data platforms are deployed with IaC tools
- Writing tests (pytest for Python, dbt tests for models, data quality tests) and contributing to internal documentation as new patterns emerge
Requirements:
- 4-7 years of professional experience building and operating data pipelines in production
- From conversation to shipped data product: you're comfortable owning a request end-to-end: scoping it with a non-technical stakeholder, writing requirements clear enough that you (and others) can build against them, implementing the models or metrics, and verifying with the stakeholder that what shipped solves their problem
- Strong Python: comfortable writing modules, structuring code for reuse and testability, and debugging issues across an async or orchestrated pipeline
- Solid SQL skills, including window functions, CTEs (including recursive ones), and the ability to reason about query performance
- Hands-on experience with dbt: building models, writing tests, and understanding materializations
- Working knowledge of an orchestration framework: (Dagster, Airflow, Prefect, or similar), including the mental model of assets/tasks, dependencies, and scheduling
- Comfort with AWS fundamentals: S3, IAM, Secrets Manager, and either ECS or Lambda for compute
- Git-based workflows: code review, and writing PRs that are reviewable
- Experience with Dagster specifically
- Experience with semantic layer tools (Cube.js, dbt Semantic Layer/MetricFlow, LookML)
- Healthcare data experience (HIPAA, EHR systems, ICD-10/CPT codes)
- CloudFormation, Terraform, or another IaC tool
- Experience with GraphQL APIs as a consumer (pagination, introspection, dealing with rate limits and retries)
- Familiarity with identity resolution patterns or slowly-changing dimension modeling