UST is a mission-driven company that transforms lives through technology. They are seeking a highly motivated Senior Site Reliability Engineer to own uptime and performance for customer-facing services, lead incident management, and establish continuous monitoring for AWS and on-prem systems.
Responsibilities:
- Own uptime and performance for customer facing services using SLO/SLI frameworks and error budgets; drive blameless postmortems and corrective actions
- Lead incident management, triage, and root cause analysis; coordinate cross functional response and stakeholder communications
- Establish and tune ing, dashboards, and log analytics using CloudWatch, AppDynamics, and Dynatrace for continuous monitoring of AWS and on prem systems
Requirements:
- 5 - 6 years hands on in production operations/SRE and release engineering
- Strong experience managing hybrid environments (AWS + on prem)
- Proven expertise in CI/CD (Jenkins)
- Proficiency in Python and Shell scripting; Linux administration and JVM tuning
- Deep experience with observability (CloudWatch, AppDynamics, Dynatrace)
- Solid database fundamentals and query troubleshooting across SQL/NoSQL