Plume is a mission-driven company focused on transforming healthcare for every trans life. The Senior Data Engineer will build, maintain, and optimize data pipelines and applied AI workflows, ensuring high-quality data management and collaboration with various stakeholders.
Responsibilities:
- Building and maintaining production-grade data pipelines in cloud data warehouses such as Google BigQuery or equivalent, following architectural standards set by the Director of Data and AI
- Designing and developing dbt models across bronze, silver, and gold layers, including a focus on quality and governance via automated tests, documentation, and incremental load strategies
- Creating and optimizing Airflow DAGs for data workflow orchestration, including scheduling, dependency management, error handling, and alerting
- Implement dimensional data models and data mart structures — guided by the team's modeling standards — that support clinical BI and ML feature consumption
- Crafting easy-to-understand visualizations and dashboards that align with commonly used business analytic standards in Looker or equivalent BI tools in close collaboration with product analytics, finance, operations, growth, and clinical stakeholders
- Integrating healthcare data from sources such as EHRs, Stripe, 3rd-party APIs, and application database feeds, normalizing incoming data into the unified data platform
- Applying HIPAA-compliant data handling practices, including PHI/PII masking, tokenization, audit logging, and role-based access controls across all pipeline and AI system work
- Architecting and implementing RAG pipelines — including document ingestion, chunking, embedding generation, and retrieval — using frameworks such as LangChain or LangGraph
- Supporting MLOps workflows, including model training pipeline maintenance, deployment support, performance monitoring, and retraining triggers
- Code reviewing PRs from teammates, providing constructive technical feedback to peers, and upholding the team's engineering standards
- Collaborating closely with product managers to understand requirements and deliver reliable data and AI products
- Monitoring and triaging assigned pipeline and data quality failures, escalating architectural issues as appropriate
- Documenting pipeline designs, data models, and technical decisions in alignment with the team's governance and lineage tracking standards
- Evaluating new tools and frameworks, providing hands-on prototyping and technical assessments
Requirements:
- 5+ years of hands-on experience in data engineering, analytics engineering, or a closely related role
- 2+ years of experience working within the healthcare industry, including working knowledge of healthcare data standards, clinical workflows, regulated data environments, and domain-specific data visualizations
- Working knowledge of HIPAA — including PHI/PII classification, data masking, audit logging, and access control requirements
- Proven production experience with at least one major cloud data warehouse: BigQuery, Snowflake, or Redshift — including advanced SQL and query optimization
- Strong hands-on experience with dbt (Core or Cloud), including incremental models, tests, documentation, and multi-environment workflows
- Deep experience with Apache Airflow for workflow orchestration, including DAG design, scheduling, monitoring, and failure handling
- Demonstrated knowledge of dimensional data modeling — star/snowflake schemas, SCD Types 1/2, fact and dimension table design
- Hands-on experience delivering dashboards and reports in at least one enterprise BI tool: Looker, Power BI, Tableau, Qlik, etc
- Proficiency in Python for data pipeline development, API integrations, and automation (Pandas, PySpark, or similar)
- Practical exposure to RAG pipeline development and LLM integration using LangChain, LangGraph, or LlamaIndex
- Hands-on exposure to MLOps concepts — model deployment, monitoring, and retraining workflows
- Knowledge of CI/CD tooling for data and AI workloads (GitHub Actions, dbt Cloud CI)
- Strong understanding of data quality and governance principles: lineage, access controls, data contracts, and automated testing and experience with data governance tools such as OpenMetadata
- Excellent written and verbal communication skills with the ability to collaborate effectively across engineering, analytics, and clinical teams
- Ability to work independently on assigned workstreams while keeping the Director and team informed of progress, blockers, and risks
- Experience with real-time or streaming data pipelines using Kafka, Kinesis, or Pub/Sub, particularly for ADT or clinical event feeds
- Knowledge of vector databases such as Pinecone, Weaviate, FAISS, or Chroma
- Familiarity with responsible AI principles, including bias detection and model explainability in a healthcare context
- Experience with data observability tools such as Monte Carlo, Bigeye, or Soda
- Familiarity with data lakehouse patterns (Delta Lake, Iceberg, Apache Hudi)
- Experience working toward or maintaining SOC2 or HITRUST certification
- Familiarity with semantic layer tools (Looker LookML, dbt Semantic Layer)
- Experience with population health, revenue cycle, or clinical quality reporting datasets
- Exposure to Kubernetes or containerized ML workloads