About this role

UST is a mission-driven company that transforms lives through technology. They are seeking a highly motivated Senior Site Reliability Engineer to own uptime and performance for customer-facing services, lead incident management, and establish continuous monitoring for AWS and on-prem systems.

Responsibilities:

Own uptime and performance for customer facing services using SLO/SLI frameworks and error budgets; drive blameless postmortems and corrective actions
Lead incident management, triage, and root cause analysis; coordinate cross functional response and stakeholder communications
Establish and tune ing, dashboards, and log analytics using CloudWatch, AppDynamics, and Dynatrace for continuous monitoring of AWS and on prem systems

Requirements:

5 - 6 years hands on in production operations/SRE and release engineering
Strong experience managing hybrid environments (AWS + on prem)
Proven expertise in CI/CD (Jenkins)
Proficiency in Python and Shell scripting; Linux administration and JVM tuning
Deep experience with observability (CloudWatch, AppDynamics, Dynatrace)
Solid database fundamentals and query troubleshooting across SQL/NoSQL

Senior Site Reliability Engineer

Key skills

About this role

Responsibilities:

Requirements: