Natera is a global leader in cell-free DNA testing, dedicated to oncology, women’s health, and organ health. They are seeking a highly skilled and motivated Senior DevOps/SRE Engineer to establish and maintain robust development and operations practices across multiple product teams, focusing on cloud infrastructure and automation.
Responsibilities:
- Own the entire Laboratory Operations Software release process execution, ensuring smooth and timely software releases with minimal downtime
- Continuously monitor the effectiveness of the release process and implement improvements to increase efficiency, reduce errors, and enhance overall quality
- Act as an internal consultant and subject matter expert, coaching individual product teams on best-in-class DevOps practices, including infrastructure-as-code (IaC), monitoring, logging, and security integration
- Embed with development teams to assess and improve DevOps maturity, delivery practices, and operational readiness
- Design and implement a variety of projects to support extreme growth of complexity of applications as well as to enable innovation
- Provide hands-on guidance in CI/CD, cloud infrastructure usage, Kubernetes operations, and observability
- Help teams adopt existing infrastructure, platforms, and tooling provided by central Cloud / Platform teams
- Promote and reinforce technical standards, guardrails, and best practices that allow teams to operate autonomously while remaining compliant and secure
- Guide teams in applying organizational expectations around reliability, security, and cost management through automation rather than manual controls
- Serve as a feedback channel to central platform and cloud teams, sharing adoption challenges and improvement opportunities
- Continuously improve and automate infrastructure provisioning, configuration management, application deployment, and testing using tools like Terraform, Kubernetes and CI/CD
- Advocate for automation-first approaches to reduce operational toil and risk
- Partner with teams to define and implement Service Level Indicators (SLIs), Service Level Objectives (SLOs), and operational dashboards for their services
- Guide teams through incident response, post-incident reviews, and reliability improvements
- Identify systemic reliability issues and escalate platform-level concerns to the appropriate owning teams
- Drive capacity planning and performance tuning activities to ensure scalability and efficiency
- Provide expert-level support for complex infrastructure and deployment issues escalated by the product teams
- Assist teams in root cause analysis and long-term remediation
- Create and maintain clear documentation, runbooks, release process, CI/CD pipelines, and regression testing procedures
- Maintain comprehensive documentation of the release process, CI/CD pipelines, and regression testing procedures
- Share best practices and lessons learned across teams to raise overall DevOps maturity
Requirements:
- Bachelor's degree in Computer Science, Engineering, or a related field, or equivalent practical experience
- 7+ years of professional software engineering experience building production-grade systems with emphasis on automation, integrations and infrastructure tooling
- Excellent problem-solving skills with the ability to troubleshoot complex issues in a fast-paced environment
- Excellent communication, coaching, and collaboration skills, with the ability to work effectively across teams and convey technical concepts to non-technical stakeholders
- Deep understanding of Site Reliability Engineering (SRE) principles, including SLIs, SLOs, error budgets, and toil reduction
- Expertise in setting up and managing comprehensive monitoring, logging, and alerting systems
- Proven experience with incident response and leading post-incident review (post-mortem) processes
- Experience with capacity planning, performance analysis, and optimization of distributed systems
- Strong expertise in CI/CD tools (e.g., Jenkins, GitLab CI, etc.)
- Practical experience building complex CI/CD pipelines
- Proficiency in at least one programming language (e.g., Java, Python)
- Ability to read and understand Java code
- Strong command of AWS stack
- Proficiency in Docker, Kubernetes and Helm
- Experience working with databases (SQL, MySQL, PostgreSQL)
- Version control systems (e.g., Git)
- Experience working with Terraform
- Knowledge of various regression testing frameworks and tools (e.g. JUnit)
- Experience with Agile/Scrum methodologies
- Experience working in regulated environments like healthcare