Optum is a global leader in health care innovation, developing cutting-edge solutions to improve health systems. The Senior Cloud Engineer will ensure the reliability and security of public cloud environments, focusing on incident response, automation, and Infrastructure as Code while collaborating with application teams.
Responsibilities:
- Operate and support public cloud environments running production workloads
- Apply Well‑Architected Framework principles (security, reliability, performance efficiency, cost optimization, operational excellence) to drive continuous improvement
- Support day‑to‑day operations, including on‑call rotation and scheduled maintenance
- Coordinate and execute high‑quality planned changes without service impact
- Lead incident response, participate in war rooms, perform root cause analysis, and drive corrective actions
- Improve monitoring, alerting, and observability across cloud environments
- Design, build, and maintain Infrastructure as Code using Terraform and GitHub Actions
- Develop automation using Python (and PowerShell where appropriate) to reduce operational toil
- Promote standardized, repeatable deployment patterns
- Reduce configuration drift and ensure consistency across dev, test, and prod environments
- Leverage Generative AI tools to improve development and operational efficiency
- Embed security best practices into cloud designs (identity, networking, encryption, secrets management)
- Enforce enterprise security, governance, and compliance requirements
- Identify and remediate cloud vulnerabilities within defined SLAs
- Partner with security teams to meet regulatory and compliance expectations
- Identify and execute cloud cost optimization opportunities
- Serve as a trusted technical advisor to application teams adopting public cloud
- Mentor junior engineers and contribute to shared documentation and operational playbooks
- Stay current on enterprise cloud platforms, tooling, and industry best practices
- Demonstrate solid ownership, self‑initiative, and commitment to mastering public cloud operations
- Leverage enterprise-approved AI tools to streamline workflows, automate tasks, and drive continuous improvement
Requirements:
- Undergraduate degree or equivalent practical experience
- 8+ years of hands-on experience in cloud engineering, infrastructure, or platform operations
- 8+ years supporting mission-critical production environments
- 3+ years developing Infrastructure as Code (Terraform) and automation (Python)
- Solid production experience in Azure (preferred); AWS or GCP acceptable with demonstrated ability to ramp quickly into Azure
- Proven deep understanding of cloud infrastructure, networking, identity, and security fundamentals
- Experience working ServiceNow incidents and participating in war rooms for high-severity production issues
- Demonstrated automation skills using Python
- Ability to participate in an on-call rotation and perform off-hours maintenance
- Practical experience applying SRE principles (error budgets, reliability metrics, incident postmortems) in production environments
- Advanced public cloud certification (e.g., Azure Solutions Architect Expert)
- Experience with CI/CD pipelines and Git-based workflows, including GitHub Actions
- Experience with Splunk or similar observability tools, including alerts, dashboards, and reporting
- Demonstrated familiarity with cloud cost optimization / FinOps practices
- Experience using AI-assisted development tools (e.g., GitHub Copilot) to improve automation and Infrastructure as Code