Ping Identity is a company focused on making digital experiences secure and seamless for all users. As a Senior Manager in Site Reliability Engineering, you will lead a team responsible for building and maintaining the infrastructure of a large identity platform, ensuring operational excellence and collaboration across various teams.
Responsibilities:
- Leadership and Mentorship of a team of 8-10 SREs
- Oversee and maintain our production infrastructure hosted on AWS with a 99.99%+ uptime SLA
- Collaboration with other SRE, Security and Development teams
- Define processes for the team to efficiently meet target dates
- Drive large projects to completion working with multiple Development teams
- Capacity analysis and planning
- Effectively manage and scale infrastructure through automation standards
- Analyze complex system behavior, performance and application issues
- Oversee observability and analysis across multiple datacenters
Requirements:
- 6+ years experience leading a software focused SRE team of 8-10 staff
- Experience working in organizations with a global presence
- The ability to drive decisions around build vs buy
- Develop, maintain and administer modern infrastructure tooling with an emphasis on Infrastructure As Code (IAC)
- Experience provisioning public cloud resources using IAC tools such as CloudFormation and Terraform
- Knowledge of scripting and programming standards (Python/Ruby/Bash/Go/etc.)
- Experience with Docker and container orchestration (Kubernetes)
- Experience using Git in a large team environment
- Experience with Security design principles
- Experience in a high-volume or critical production service environment
- IP networking; familiarity with the functionality, operating, and failure modes of networks
- Familiarity with observability tooling such as NewRelic, Splunk, Grafana, and Cloudwatch
- Familiarity with DevOps automation tools such as Jenkins, Artifactory, Spacelift
- Solid experience with server configuration with Puppet/Chef/Salt