CyberArk, a Palo Alto Networks company, is the global leader in identity security. They are seeking a Staff Site Reliability Engineer to design and implement AWS infrastructure components, lead architecture and management automation, and provide guidance on maintaining reliability and performance of SaaS environments.
Responsibilities:
- Design Implementation of AWS infrastructure components such as VPCs, EC2, EKS, S3, tagging schemes, CloudFormation, etc
- Lead architecture, designs and feature analysis of deployment and management automation of cloud-based infrastructure and software
- Provide guidance to Site Reliability and DevOps Engineers on managing the reliability and performance of SaaS environments as well as on building automation to prevent problem reoccurrence
- Architecting and guiding the team with the use of configuration management tools in both Windows and Linux - CloudFormation, Helm, Terraform, Salt, Ansible
- Ensuring cloud-based architectures meet availability and recoverability requirements
- Architecture and implementation of cloud-based monitoring, alerting and reporting – Datadog, Logz.io, InfluxDb, CloudWatch, Catchpoint, ELK, Grafana
- Support and guidance on tooling that helps to enable teams for greater output and reliability
- Deep understanding of the latest tech solutions, trends, and ability to dive into the details of the architecture as needed
- Work with the Team Leads within the group to identify areas of improvement, prepare architecture road maps, and advocate to the Product Management group
Requirements:
- B.S. in Computer Science or equivalent experience
- Minimum 4 years of experience managing AWS infrastructure
- Minimum of 7 years in a senior, architect or a technical lead role of site reliability, systems engineering or software development
- A deep understanding of Site Reliability, infrastructure and Cloud Platform
- Expert understanding/experience of containerization services such as Docker/Kubernetes
- Expert in open-source tools such as Datadog, InfluxDb, Grafana, Logstash, Elasticsearch
- Solid understanding/experience of web services, databases and relating infrastructure/architectures
- Solid understanding of backup/restore best practices
- Strong level of expertise programming writing configuration management languages
- Strong level of expertise programming in Python / C# / C++ / Java or equivalent language
- Excellent Troubleshooting Skills
- Experience supporting an enterprise-level SaaS environment
- Security Experience a plus