Progressive Insurance is dedicated to helping employees move forward in their careers. As a DevOps Engineer Lead or Consultant within their Cloud organization, you will drive the strategy and execution of Site Reliability Engineering (SRE) practices while partnering with teams to enhance reliability and performance of cloud applications.
Responsibilities:
- Drive the strategy, standards, and execution of Site Reliability Engineering (SRE) and Enterprise Monitoring (EM) practices across platforms
- Partner closely with the EM team and SRE leaders to strengthen reliability, enhance observability maturity, reduce operational toil, and improve the resiliency and performance of cloud hosted and hybrid applications
- Lead engineering initiatives focused on automated recovery, service level management, performance optimization, and end-to-end observability
- Influence teams across IT to adopt SRE principles, improve supportability, evolve monitoring solutions, and implement patterns that drive consistent, reliable operations
- Ensure cloud services scale effectively, remain secure, and meet evolving expectations around performance and uptime while fostering a culture of continuous improvement, engineering excellence, and accountability
Requirements:
- Demonstrated leadership of SRE initiatives, including resiliency improvements, incident reduction, SLI/SLO development, error‑budget practices, and automation of manual workflows
- Experience operating across hybrid cloud environments (AWS, Azure, OpenStack) and container platforms such as Kubernetes or OpenShift
- Strong collaboration with monitoring and reliability engineering teams to enhance observability, supportability, and incident readiness using tools like Splunk, AppDynamics, and or other APM/observability suites
- Proven ability to apply SRE principles to analyze complex systems, identify reliability gaps, and drive long‑term improvements in stability and performance
- Proven influence across cross‑functional engineering teams, with a strong foundation in cloud security concepts and experience in incident, problem, and change management processes