VySystems is focused on ensuring production reliability and operational excellence of their AI platform. The Site Reliability Engineer will emphasize resilience, performance, and incident management to support mission-critical platforms in US enterprises.
Responsibilities:
- Ensures production reliability and operational excellence of the AI platform, with strong focus on resilience, performance, and incident management
Requirements:
- 7-12+ years in SRE / production engineering
- Deep expertise in observability tools and incident management systems
- Experience supporting mission-critical platforms in US enterprises
- Strong understanding of distributed systems and cloud reliability patterns