Home
Jobs
Saved
Resumes
Principal Site Reliability Engineer at Saviynt | JobVerse
JobVerse
Home
Jobs
Recruiters
Companies
Pricing
Blog
Jobs
/
Principal Site Reliability Engineer
Saviynt
Website
LinkedIn
Principal Site Reliability Engineer
Bengaluru, Karnataka, India
Full Time
3 weeks ago
Visa Sponsorship
Apply Now
Key skills
AWS
Distributed Systems
Kubernetes
Python
Go
AI
LLM
OpenAI
EKS
About this role
Role Overview
Own and define the long-term reliability strategy and architecture.
Design planet-scale, highly resilient systems on AWS and Kubernetes (EKS).
Lead the development of autonomous operations platforms powered by AI agents.
Architect and implement LLM-driven SRE systems using tools like the OpenAI API:
Intelligent incident detection and triage.
Automated root cause analysis.
Self-healing remediation systems.
Establish gold standards for SRE practices:
SLOs, SLAs, error budgets.
Incident management frameworks.
Reliability-first system design.
Drive observability architecture at scale (metrics, logs, traces, events).
Lead cross-team initiatives to embed reliability into product and platform design.
Mentor senior engineers and act as a technical authority across teams.
Guide decisions around cost optimization, scalability, and performance trade-offs.
Introduce chaos engineering and resilience testing at system-wide level.
Requirements
10+ years in SRE / Platform / Distributed Systems Engineering.
Proven experience designing and operating large-scale distributed systems.
Deep expertise in:
AWS architecture at scale
Kubernetes internals and operations
System reliability, scalability, and performance engineering
Strong programming skills (Python / Go) with focus on building platforms/tools.
Experience leading cross-functional technical initiatives.
Ability to influence architecture and engineering culture across the company.
Experience integrating LLMs into production systems (e.g., via OpenAI API).
Built or designed AI-driven automation / AIOps systems.
Strong interest in autonomous systems and self-healing infrastructure.
Tech Stack
AWS
Distributed Systems
Kubernetes
Python
Go
Benefits
Employee wellness programs
Apply Now
Home
Jobs
Saved
Resumes