Plume is a mission-driven company focused on transforming healthcare for every trans life. The Senior Data Engineer will build, maintain, and optimize data pipelines and applied AI workflows, ensuring high-quality data management and collaboration with various stakeholders.

Responsibilities:

Building and maintaining production-grade data pipelines in cloud data warehouses such as Google BigQuery or equivalent, following architectural standards set by the Director of Data and AI
Designing and developing dbt models across bronze, silver, and gold layers, including a focus on quality and governance via automated tests, documentation, and incremental load strategies
Creating and optimizing Airflow DAGs for data workflow orchestration, including scheduling, dependency management, error handling, and alerting
Implement dimensional data models and data mart structures — guided by the team's modeling standards — that support clinical BI and ML feature consumption
Crafting easy-to-understand visualizations and dashboards that align with commonly used business analytic standards in Looker or equivalent BI tools in close collaboration with product analytics, finance, operations, growth, and clinical stakeholders
Integrating healthcare data from sources such as EHRs, Stripe, 3rd-party APIs, and application database feeds, normalizing incoming data into the unified data platform
Applying HIPAA-compliant data handling practices, including PHI/PII masking, tokenization, audit logging, and role-based access controls across all pipeline and AI system work
Architecting and implementing RAG pipelines — including document ingestion, chunking, embedding generation, and retrieval — using frameworks such as LangChain or LangGraph
Supporting MLOps workflows, including model training pipeline maintenance, deployment support, performance monitoring, and retraining triggers
Code reviewing PRs from teammates, providing constructive technical feedback to peers, and upholding the team's engineering standards
Collaborating closely with product managers to understand requirements and deliver reliable data and AI products
Monitoring and triaging assigned pipeline and data quality failures, escalating architectural issues as appropriate
Documenting pipeline designs, data models, and technical decisions in alignment with the team's governance and lineage tracking standards
Evaluating new tools and frameworks, providing hands-on prototyping and technical assessments

Requirements:

5+ years of hands-on experience in data engineering, analytics engineering, or a closely related role
2+ years of experience working within the healthcare industry, including working knowledge of healthcare data standards, clinical workflows, regulated data environments, and domain-specific data visualizations
Working knowledge of HIPAA — including PHI/PII classification, data masking, audit logging, and access control requirements
Proven production experience with at least one major cloud data warehouse: BigQuery, Snowflake, or Redshift — including advanced SQL and query optimization
Strong hands-on experience with dbt (Core or Cloud), including incremental models, tests, documentation, and multi-environment workflows
Deep experience with Apache Airflow for workflow orchestration, including DAG design, scheduling, monitoring, and failure handling
Demonstrated knowledge of dimensional data modeling — star/snowflake schemas, SCD Types 1/2, fact and dimension table design
Hands-on experience delivering dashboards and reports in at least one enterprise BI tool: Looker, Power BI, Tableau, Qlik, etc
Proficiency in Python for data pipeline development, API integrations, and automation (Pandas, PySpark, or similar)
Practical exposure to RAG pipeline development and LLM integration using LangChain, LangGraph, or LlamaIndex
Hands-on exposure to MLOps concepts — model deployment, monitoring, and retraining workflows
Knowledge of CI/CD tooling for data and AI workloads (GitHub Actions, dbt Cloud CI)
Strong understanding of data quality and governance principles: lineage, access controls, data contracts, and automated testing and experience with data governance tools such as OpenMetadata
Excellent written and verbal communication skills with the ability to collaborate effectively across engineering, analytics, and clinical teams
Ability to work independently on assigned workstreams while keeping the Director and team informed of progress, blockers, and risks
Experience with real-time or streaming data pipelines using Kafka, Kinesis, or Pub/Sub, particularly for ADT or clinical event feeds
Knowledge of vector databases such as Pinecone, Weaviate, FAISS, or Chroma
Familiarity with responsible AI principles, including bias detection and model explainability in a healthcare context
Experience with data observability tools such as Monte Carlo, Bigeye, or Soda
Familiarity with data lakehouse patterns (Delta Lake, Iceberg, Apache Hudi)
Experience working toward or maintaining SOC2 or HITRUST certification
Familiarity with semantic layer tools (Looker LookML, dbt Semantic Layer)
Experience with population health, revenue cycle, or clinical quality reporting datasets
Exposure to Kubernetes or containerized ML workloads

Senior Data Engineer (Data + Applied AI)

Key skills

About this role

Responsibilities:

Requirements: