Architect, implement, and maintain data ingestion and transformation pipelines using modern workflow orchestration tools (e.g. Dagster).
Identify, catalog, and integrate internal and external data sources used across research efforts.
Operationalize bioinformatics pipelines that support large-scale batch processing, incremental updates, and backfills within AWS.
Normalize and structure heterogeneous data into consistent, reusable representations that support downstream analysis, modeling, and querying.
Populate and maintain patient-centric data models in shared storage systems (e.g., graph and relational databases).
Collaborate with backend and AI engineers to design data-access patterns that support analytics applications and AI-driven interactions.
Contribute to backend services and APIs that expose integrated data to internal tools and applications.
Participate in the evolution of AI-enabled analysis workflows, including tooling that supports LLM
or agent-based interactions with data.
Contribute to system-level design decisions around data flow, service boundaries, reliability, and scalability.
Write clean, tested, and well-documented Python code that meets production software engineering standards.
Debug and resolve complex data quality, pipeline, backend, and infrastructure issues in a distributed environment.
Requirements
BS in Computer Science, Bioinformatics, Computational Biology, or a related field, MS preferred.
4+ years of experience in production data engineering or software engineering.
Independently drive technical solutions from high-level goals, exercising judgment in system design, implementation, and tradeoff evaluation.
Strong proficiency in Python, with experience writing maintainable, production-quality code across data and backend contexts.
Extensive experience with software engineering fundamentals, design patterns, version control, CI/CD, Docker, and automated testing.
Experience designing and operating workflow orchestration systems (Dagster preferred; Airflow, Prefect, or similar acceptable).
Experience building or contributing to backend services (e.g., FastAPI or similar frameworks).
Hands-on experience with AWS services commonly used in data and backend systems (e.g., S3, ECS, Batch, Lambda).
Experience deploying and operating large-scale data or bioinformatics pipelines in AWS, including managing throughput, cost, and operational reliability.
Experience with relational databases (Postgres, MySQL) and/or graph databases (Neo4j), including schema and query design.
Experience contributing to system-level architecture, including data modeling, service boundaries, and operational robustness.
Ability to work effectively with scientists, bioinformaticians, and ML practitioners in an R&D environment.
Tech Stack
Airflow
AWS
Docker
MySQL
Neo4j
Postgres
Python
Benefits
Comprehensive medical, dental, vision, life and disability plans for eligible employees and their dependents.
Free testing for Natera employees and their immediate families in addition to fertility care benefits.