Altarum is building the future of data and AI infrastructure for public health, and they’re looking for a Principal Data Engineer – ML Platforms to help lead the way. In this role, you will design, build, and operationalize modern data and ML platform capabilities that power analytics, evaluation, AI modeling, and interoperability across all Altarum divisions.

Responsibilities:

Design and operate modern, cloud-agnostic lakehouse architecture using object storage, SQL/ELT engines, and dbt
Build CI/CD pipelines for data, dbt, and model delivery (GitHub Actions, GitLab, Azure DevOps)
Implement MLOps systems: MLflow (or equivalent), feature stores, model registry, drift detection, automated testing
Engineer solutions in AWS and AWS GovCloud today, with portability to Azure Gov or GCP
Use Infrastructure-as-Code (Terraform, CloudFormation, Bicep) to automate secure deployments
Build scalable ingestion and normalization pipelines for healthcare and public health datasets, including: FHIR R4 / US Core (strongly preferred), HL7 v2 (strongly preferred), Medicaid/Medicare claims & encounters (strongly preferred), SDOH & geospatial data (preferred), Survey, mixed-methods, and qualitative data
Create reusable connectors, dbt packages, and data contracts for cross-division use
Publish clean, conformed, metrics-ready tables for Analytics Engineering and BI teams
Support Population Health in turning evaluation and statistical models into pipelines
Define SLOs and alerting; instrument lineage & metadata; ensure ≥95% of data tests pass
Perform performance and cost tuning (partitioning, storage tiers, autoscaling) with guardrails and dashboards
Build production-grade pipelines for risk prediction, forecasting, cost/utilization models, and burden estimation
Develop ML-ready feature engineering workflows and support time-series/outbreak detection models
Integrate ML assets into standardized deployment workflows
Build ingestion and vectorization pipelines for surveys, interviews, and unstructured text
Support RAG systems for synthesis, evaluation, and public health guidance
Enable Palladian Partners with secure, controlled-generation environments
Translate R/Stata/SAS evaluation code into reusable pipelines
Build templates for causal inference workflows (DID, AIPW, CEM, synthetic controls)
Support operationalization of ARA’s applied research methods at scale
Implement Model Context Protocol (MCP) and fairness/explainability tooling (SHAP, LIME)
Ensure compliance with HIPAA, 42 CFR Part 2, IRB/DUA constraints, and NIST AI RMF standards
Enforce privacy-by-design: tokenization, encryption, least-privilege IAM, and VPC isolation
Develop runbooks, architecture diagrams, repo templates, and accelerator code
Pair with data scientists, analysts, and SMEs to build organizational capability
Provide technical guidance for proposals and client engagements

Requirements:

7–10+ years in data engineering, ML platform engineering, or cloud data architecture
Expert in Python, SQL, dbt, and orchestration tools (Airflow, Glue, Step Functions)
Deep experience with AWS + AWS GovCloud
CI/CD and IaC experience (Terraform, CloudFormation)
Familiarity with MLOps tools (MLflow, Sagemaker, Azure ML, Vertex AI)
Ability to operate in regulated environments (HIPAA, 42 CFR Part 2, IRB)
Experience with FHIR, HL7, Medicaid/Medicare claims, and/or SDOH datasets
Databricks, Snowflake, Redshift, Synapse
Event streaming (Kafka, Kinesis, Event Hubs)
Feature store experience
Observability tooling (Grafana, Prometheus, OpenTelemetry)
Experience optimizing BI datasets for Power BI

Principal Data Engineer – ML Platforms

Key skills

About this role

Responsibilities:

Requirements: