Natera is a global leader in cell-free DNA testing, focused on oncology, women’s health, and organ health. They are seeking a Senior Data Engineer to join their Therapeutics & Innovations group, responsible for designing and implementing robust data ingestion and transformation pipelines to support therapeutic development and scientific innovation.

Responsibilities:

Architect, implement, and maintain data ingestion and transformation pipelines using modern workflow orchestration tools (e.g. Dagster)
Identify, catalog, and integrate internal and external data sources used across research efforts
Operationalize bioinformatics pipelines that support large-scale batch processing, incremental updates, and backfills within AWS
Normalize and structure heterogeneous data into consistent, reusable representations that support downstream analysis, modeling, and querying
Populate and maintain patient-centric data models in shared storage systems (e.g., graph and relational databases)
Collaborate with backend and AI engineers to design data-access patterns that support analytics applications and AI-driven interactions
Contribute to backend services and APIs that expose integrated data to internal tools and applications
Participate in the evolution of AI-enabled analysis workflows, including tooling that supports LLM- or agent-based interactions with data
Contribute to system-level design decisions around data flow, service boundaries, reliability, and scalability
Write clean, tested, and well-documented Python code that meets production software engineering standards
Debug and resolve complex data quality, pipeline, backend, and infrastructure issues in a distributed environment

Requirements:

BS in Computer Science, Bioinformatics, Computational Biology, or a related field, MS preferred
4+ years of experience in production data engineering or software engineering
Independently drive technical solutions from high-level goals, exercising judgment in system design, implementation, and tradeoff evaluation
Strong proficiency in Python, with experience writing maintainable, production-quality code across data and backend contexts
Extensive experience with software engineering fundamentals, design patterns, version control, CI/CD, Docker, and automated testing
Experience designing and operating workflow orchestration systems (Dagster preferred; Airflow, Prefect, or similar acceptable)
Experience building or contributing to backend services (e.g., FastAPI or similar frameworks)
Hands-on experience with AWS services commonly used in data and backend systems (e.g., S3, ECS, Batch, Lambda)
Experience deploying and operating large-scale data or bioinformatics pipelines in AWS, including managing throughput, cost, and operational reliability
Experience with relational databases (Postgres, MySQL) and/or graph databases (Neo4j), including schema and query design
Experience contributing to system-level architecture, including data modeling, service boundaries, and operational robustness
Ability to work effectively with scientists, bioinformaticians, and ML practitioners in an R&D environment
Experience integrating machine-learning inference outputs into data pipelines
Familiarity with LLM-based agents and associated frameworks such as LangChain
Familiarity with bioinformatics data formats and pipelines (e.g., FASTQ, BAM/CRAM, VCF, RNAseq, WES/WGS)
Experience with infrastructure as code (Terraform)
Experience with DNAnexus
Understanding of genomics, proteomics, or other omics data types and their downstream analytical use cases
Ability to evaluate build-vs-buy tradeoffs in fast paced environments

Senior Data Engineer, Platform & Pipelines

Key skills

About this role

Responsibilities:

Requirements: