About this role

Zoom is seeking a Senior Site Reliability Engineer to support their Kubernetes platforms and customer-facing data systems. The role focuses on improving system reliability, scalability, and operations across distributed infrastructure and data platforms, while collaborating with various engineering teams.

Responsibilities:

Support Kubernetes platforms and customer-facing data systems
Improve system reliability, scalability, and day-to-day operations across distributed infrastructure and data platforms
Partner with Infrastructure, Data Platform, and Application Engineering teams to reduce operational workload, improve incident response, and drive automation across multi-region environments

Requirements:

Have 6+ years of experience in SRE, Platform Engineering, or Infrastructure roles
Show hands-on experience with Kubernetes (K8s) in production environments
Have experience in Linux systems, networking fundamentals, and distributed systems
Show experience with monitoring and observability tools (Prometheus, Grafana, Datadog, PagerDuty, etc.)
Demonstrate effective programming/scripting skills in Python, Go, or Shell
Be able to build and operate CI/CD pipelines (GitHub Actions, Jenkins, ArgoCD, etc.) and support data platforms (Spark, Trino/Presto, Airflow, Kafka) in production
Have hands-on experience with cloud platforms (AWS, GCP, Azure) and incident management, troubleshooting and RCA skills
Experience in data platform reliability and automation and AI-assisted operations (a bonus)

Senior Site Reliability Engineer

Key skills

About this role

Responsibilities:

Requirements: