Ping Identity is a company dedicated to making digital experiences secure and seamless for users. As a Senior Manager of SRE, you will lead a team in managing the infrastructure of a large identity platform, focusing on operational excellence and automation.
Responsibilities:
- Provide leadership and mentorship to a team of 8-10 Site Reliability Engineers (SREs)
- Possess expertise in defining, measuring, and reporting on key Service Level Objectives (SLOs) and Service Level Indicators (SLIs) to ensure adherence to the 99.99%+ uptime Service Level Agreement (SLA)
- Collaborate effectively with other SRE, Security, and Development teams
- Define and implement processes to ensure the team efficiently meets target deadlines
- Drive the successful completion of large-scale projects, coordinating with multiple Development teams
- Conduct thorough capacity analysis and planning
- Effectively manage and scale infrastructure by establishing and adhering to automation standards
- Analyze and resolve complex system behavior, performance, and application issues
- Oversee comprehensive observability and analysis across multiple datacenters
Requirements:
- Minimum of six years of experience leading a software-focused Site Reliability Engineering (SRE) team of eight to ten staff
- Demonstrated experience working within organizations operating on a global scale
- Proven ability to drive strategic decisions regarding 'build vs. buy' technology choices
- Proficiency in developing, maintaining, and administering modern infrastructure tooling, with a strong emphasis on Infrastructure as Code (IaC) principles
- Experience provisioning public cloud resources utilizing IaC tools such as CloudFormation and Terraform
- Solid knowledge of scripting and programming standards (e.g., Python, Ruby, Bash, Go)
- Experience with Docker and container orchestration platforms (e.g., Kubernetes)
- Practical experience using Git in a large-scale team environment
- Understanding and application of security design principles
- Experience operating within a high-volume or mission-critical production service environment
- Expertise in IP networking, including familiarity with network functionality, operational procedures, and failure modes
- Familiarity with observability tooling such as NewRelic, Splunk, Grafana, and Cloudwatch
- Familiarity with DevOps automation tools such as Jenkins, Artifactory, Spacelift
- Solid experience with server configuration with Puppet/Chef/Salt