Optum is a global leader in health care innovation, developing cutting-edge solutions to improve health systems. As a Senior Site Reliability Engineer, you will enhance the reliability and efficiency of the Optum Consumer Payment Network by leveraging modern cloud technologies and fostering a strong DevOps culture across engineering teams.
Responsibilities:
- Enable teams to define, measure, and meet reliability goals (SLIs/SLOs) by strengthening post-incident learning, reducing alert noise, and helping teams create and maintain quality runbooks
- Build and enhance shared observability capabilities (metrics, monitoring, logging, dashboards, and alerting) to support >99.95% availability for business-critical applications
- Partner with software engineers across the organization to provide hands-on guidance by establishing patterns for engineering excellence initiatives (zero-downtime deployments, automated remediation)
- Use AI-assisted tooling to improve engineering productivity (e.g., incident analysis, automation, and documentation)
- Provide 24×7 production support via a rotating on-call schedule
Requirements:
- 5+ years of experience with DevOps, security best practices, CI/CD, infrastructure as code (IaC) and observability (e.g., GitHub, Datadog, New Relic or Dynatrace, Terraform, PagerDuty)
- 3+ years of experience operating production applications in hybrid environment (on-premises and public cloud), including Kubernetes-based workloads, in enterprise-scale production environments
- 2+ years of proficiency with a programming or scripting language for automation/tooling (e.g., .NET/C#, Java, Python, Go)
- 1+ years of experience with AIOps and/or AI-powered coding and analysis tools for faster RCA, alert noise reduction and anomaly detection
- Bachelor's or master's degree in computer science, software engineering, or a related field
- Working knowledge of cloud networking, cloud security, containerization, centralized logging, and monitoring
- Experience with cloud security controls such as DDoS protection, vulnerability management, and patching
- Experience with payment industry standards, protocols, and security best practices
- Solid foundation in Linux and/or Windows operating systems and troubleshooting tools