Salma Health is reimagining brain healthcare by integrating care delivery, technology innovation, and research breakthroughs. They are seeking a mid-level Data Engineer to build and maintain the data platform that supports their mental and behavioral health practice, focusing on data extraction, transformation, and orchestration.

Responsibilities:

Maintaining and improving the orchestration layer: Dagster assets, jobs, schedules, sensors, and the dependency graph that ties extraction → loading → transformation together
Adding new data sources to the pipeline; extracting from APIs (GraphQL, REST), Google Drive folders, and CSV/JSONL drops on S3, then landing them in our bronze schemas via Dagster assets
Building silver and gold dbt models that transform raw source data into our unified entity model following the medallion architecture
Extending our semantic layer so business metrics are available to downstream consumers (BI tool dashboards, AI agents, ad-hoc analysis) without re-deriving logic
Operating the platform on AWS: ECS Fargate services, RDS, S3, Secrets Manager, CloudFormation templates, and the CodePipeline-based CI/CD that deploys our data platform. All of our data platforms are deployed with IaC tools
Writing tests (pytest for Python, dbt tests for models, data quality tests) and contributing to internal documentation as new patterns emerge

Requirements:

4-7 years of professional experience building and operating data pipelines in production
From conversation to shipped data product: you're comfortable owning a request end-to-end: scoping it with a non-technical stakeholder, writing requirements clear enough that you (and others) can build against them, implementing the models or metrics, and verifying with the stakeholder that what shipped solves their problem
Strong Python: comfortable writing modules, structuring code for reuse and testability, and debugging issues across an async or orchestrated pipeline
Solid SQL skills, including window functions, CTEs (including recursive ones), and the ability to reason about query performance
Hands-on experience with dbt: building models, writing tests, and understanding materializations
Working knowledge of an orchestration framework: (Dagster, Airflow, Prefect, or similar), including the mental model of assets/tasks, dependencies, and scheduling
Comfort with AWS fundamentals: S3, IAM, Secrets Manager, and either ECS or Lambda for compute
Git-based workflows: code review, and writing PRs that are reviewable
Experience with Dagster specifically
Experience with semantic layer tools (Cube.js, dbt Semantic Layer/MetricFlow, LookML)
Healthcare data experience (HIPAA, EHR systems, ICD-10/CPT codes)
CloudFormation, Terraform, or another IaC tool
Experience with GraphQL APIs as a consumer (pagination, introspection, dealing with rate limits and retries)
Familiarity with identity resolution patterns or slowly-changing dimension modeling

Data Engineer

Key skills

About this role

Responsibilities:

Requirements: