Datavant is the data collaboration platform trusted for healthcare, providing critical data solutions for organizations across the healthcare ecosystem. As a Staff Data Engineer, you will lead the design and build of the next-generation patient data platform, focusing on developing distributed data systems and platform capabilities to enhance the secure and intelligent use of data.
Responsibilities:
- Lead the architecture and development of core data platform capabilities, including processing frameworks, storage patterns, and shared services
- Design and implement multi-tenant, multi-cloud data systems with strong isolation, scalability, and operational durability
- Build and operate large-scale distributed data processing systems across batch and real-time workloads
- Define and evolve data lifecycle patterns, including ingestion, validation, transformation, enrichment, and serving
- Establish data quality gates and validation frameworks to ensure trust, consistency, and auditability
- Design systems that integrate with platform infrastructure, including CI/CD, deployment orchestration, observability, and infrastructure automation
- Make sound architectural decisions across performance, cost, reliability, and maintainability tradeoffs
- Lead ambiguous, high-impact initiatives where both problem definition and solution design require ownership
- Contribute significantly to production code, setting standards for quality, testing, and operability
Requirements:
- 10+ years of experience building data-intensive or distributed systems, with a strong software engineering foundation
- Proven experience designing and operating large-scale data platforms in production
- Deep expertise in distributed data processing systems (e.g., Spark or similar big data technologies)
- Strong software engineering fundamentals, including system design, testing, CI/CD, and production debugging
- Experience building systems in cloud environments (AWS preferred), including storage, compute, and security patterns
- Experience designing multi-tenant systems, with a focus on isolation, scalability, and reliability
- Strong understanding of data modeling, pipeline design, and data quality enforcement
- Ability to navigate ambiguity, evaluate tradeoffs, and drive durable technical decisions
- Track record of being a high-impact, hands-on contributor who leads through both design and execution
- Strong candidates will have experience with several of the following: Distributed data processing frameworks (e.g., Spark, Flink, or similar), Cloud data platforms (e.g., Databricks, Snowflake, or equivalent), Data transformation and modeling frameworks (dbt or equivalent), Workflow orchestration systems (e.g., Airflow or similar), Streaming and event-driven systems (e.g., Kafka or equivalent), Infrastructure-as-code (e.g., Terraform), Modern table formats and lakehouse architectures (e.g., Iceberg, Delta, or similar)
- Experience building data systems that support AI-driven use cases, including low-latency data access patterns, feature generation and ML data pipelines, iterative, feedback-driven data workflows
- Familiarity with agentic or AI-assisted coding tools, and the ability to leverage them to improve development velocity and code quality
- Comfort operating in environments where AI augments both system design and development workflows
- Experience in regulated environments (e.g., healthcare, finance)
- Familiarity with interoperability standards (e.g., FHIR, HL7, or similar)
- Experience leading large-scale platform migrations or architectural transformations