Zocdoc is a leading digital health marketplace committed to empowering patients. They are seeking a Senior Site Reliability Engineer to develop, monitor, and maintain distributed production systems, ensuring uptime for their services in a cloud environment.
Responsibilities:
- Monitoring and maintaining complex cloud-based infrastructure, systems, and services and ensuring their uptime to help millions of patients get the care they need
- Automating and developing our tooling, processes, and infrastructure to speed up development and make them repeatable and error-proof
- Supporting our large product engineering org with their scaling, performance, and uptime needs as well as helping diagnose and debug production related issues
- Analyzing and performance tuning systems, code, and networking for scaling and optimal operation
- Working with cutting edge GenAI tools and technology
Requirements:
- 5+ years of supporting consumer facing web application production environments and systems in a Site Reliability Engineering or Production Engineering role
- 2+ years of on-call experience in a 24/7 cloud-based production environment
- 2+ years of experience in managing and supporting modern cloud-based environments and infrastructure like AWS/GCP, Docker, Kubernetes, etc
- Experience with edge technologies such as load balancers, reverse proxies, web application firewalls, routing, etc
- Deep understanding of protocols such as TCP/IP, HTTP/HTTPS, TLS, DNS, NTP
- A Bachelor's degree in Computer Science, Computer Engineering, or equivalent engineering experience is a plus, but not required