Natera is a global leader in cell-free DNA testing, focused on oncology, women’s health, and organ health. They are seeking a Senior Data Engineer to join their Therapeutics & Innovations group, responsible for designing and implementing robust data ingestion and transformation pipelines to support therapeutic development and scientific innovation.
Responsibilities:
- Architect, implement, and maintain data ingestion and transformation pipelines using modern workflow orchestration tools (e.g. Dagster)
- Identify, catalog, and integrate internal and external data sources used across research efforts
- Operationalize bioinformatics pipelines that support large-scale batch processing, incremental updates, and backfills within AWS
- Normalize and structure heterogeneous data into consistent, reusable representations that support downstream analysis, modeling, and querying
- Populate and maintain patient-centric data models in shared storage systems (e.g., graph and relational databases)
- Collaborate with backend and AI engineers to design data-access patterns that support analytics applications and AI-driven interactions
- Contribute to backend services and APIs that expose integrated data to internal tools and applications
- Participate in the evolution of AI-enabled analysis workflows, including tooling that supports LLM- or agent-based interactions with data
- Contribute to system-level design decisions around data flow, service boundaries, reliability, and scalability
- Write clean, tested, and well-documented Python code that meets production software engineering standards
- Debug and resolve complex data quality, pipeline, backend, and infrastructure issues in a distributed environment
Requirements:
- BS in Computer Science, Bioinformatics, Computational Biology, or a related field, MS preferred
- 4+ years of experience in production data engineering or software engineering
- Independently drive technical solutions from high-level goals, exercising judgment in system design, implementation, and tradeoff evaluation
- Strong proficiency in Python, with experience writing maintainable, production-quality code across data and backend contexts
- Extensive experience with software engineering fundamentals, design patterns, version control, CI/CD, Docker, and automated testing
- Experience designing and operating workflow orchestration systems (Dagster preferred; Airflow, Prefect, or similar acceptable)
- Experience building or contributing to backend services (e.g., FastAPI or similar frameworks)
- Hands-on experience with AWS services commonly used in data and backend systems (e.g., S3, ECS, Batch, Lambda)
- Experience deploying and operating large-scale data or bioinformatics pipelines in AWS, including managing throughput, cost, and operational reliability
- Experience with relational databases (Postgres, MySQL) and/or graph databases (Neo4j), including schema and query design
- Experience contributing to system-level architecture, including data modeling, service boundaries, and operational robustness
- Ability to work effectively with scientists, bioinformaticians, and ML practitioners in an R&D environment
- Experience integrating machine-learning inference outputs into data pipelines
- Familiarity with LLM-based agents and associated frameworks such as LangChain
- Familiarity with bioinformatics data formats and pipelines (e.g., FASTQ, BAM/CRAM, VCF, RNAseq, WES/WGS)
- Experience with infrastructure as code (Terraform)
- Experience with DNAnexus
- Understanding of genomics, proteomics, or other omics data types and their downstream analytical use cases
- Ability to evaluate build-vs-buy tradeoffs in fast paced environments