Ping Identity is a company dedicated to making digital experiences secure and seamless for users. As a Senior Manager of SRE, you will lead a team in managing the infrastructure of a large identity platform, focusing on operational excellence and automation.

Responsibilities:

Provide leadership and mentorship to a team of 8-10 Site Reliability Engineers (SREs)
Possess expertise in defining, measuring, and reporting on key Service Level Objectives (SLOs) and Service Level Indicators (SLIs) to ensure adherence to the 99.99%+ uptime Service Level Agreement (SLA)
Collaborate effectively with other SRE, Security, and Development teams
Define and implement processes to ensure the team efficiently meets target deadlines
Drive the successful completion of large-scale projects, coordinating with multiple Development teams
Conduct thorough capacity analysis and planning
Effectively manage and scale infrastructure by establishing and adhering to automation standards
Analyze and resolve complex system behavior, performance, and application issues
Oversee comprehensive observability and analysis across multiple datacenters

Requirements:

Minimum of six years of experience leading a software-focused Site Reliability Engineering (SRE) team of eight to ten staff
Demonstrated experience working within organizations operating on a global scale
Proven ability to drive strategic decisions regarding 'build vs. buy' technology choices
Proficiency in developing, maintaining, and administering modern infrastructure tooling, with a strong emphasis on Infrastructure as Code (IaC) principles
Experience provisioning public cloud resources utilizing IaC tools such as CloudFormation and Terraform
Solid knowledge of scripting and programming standards (e.g., Python, Ruby, Bash, Go)
Experience with Docker and container orchestration platforms (e.g., Kubernetes)
Practical experience using Git in a large-scale team environment
Understanding and application of security design principles
Experience operating within a high-volume or mission-critical production service environment
Expertise in IP networking, including familiarity with network functionality, operational procedures, and failure modes
Familiarity with observability tooling such as NewRelic, Splunk, Grafana, and Cloudwatch
Familiarity with DevOps automation tools such as Jenkins, Artifactory, Spacelift
Solid experience with server configuration with Puppet/Chef/Salt

Senior Manager, Site Reliability Engineering

Key skills

About this role

Responsibilities:

Requirements: