Zoom is a collaboration platform that helps people stay connected and get more done together. They are looking for a DevOps Engineer to ensure the continuous availability and stability of their Team Chat and Zoom Events platforms through robust operational strategies and monitoring.
Responsibilities:
- Designing and implementing zero-downtime solutions for highly available services (99.999%)
- Developing and maintaining disaster recovery (DR) strategies across datacenters in different regions
- Diagnosing and resolving complex production issues, including performance and functional challenges
- Collaborating with vendors, infrastructure teams, and engineering partners to enhance security and service availability
- Administrating monitoring tools and infrastructure. Providing troubleshooting support for outages across systems and Zoom backend services
- Developing and implementing CI/CD pipelines to streamline production deployments and configurations
- Participating in on-call shifts and incident management and work after hours/weekends for application releases/deployments
- Being the primary contact for your region, addressing program queries, providing support and updates, and managing relevant content and processes
Requirements:
- 2 - 3 years experience as a DevOps Engineer or Site Reliability Engineer (SRE)
- Knowledge of CI/CD tools and integration
- Experience scripting languages such as Bash, Python, Groovy
- Configuration and deployment management using tools such as Terraform, Ansible, Jenkins
- Expertise with containerization: Kubernetes, AWS EKS, Docker
- Analytical and troubleshooting skills
- Willingness to learn, be proactive, and think creatively