Collaborate with Platform and Engineering teams on reliability improvements
Provide L2/L3 application support coverage during:
Support team resource shortages
High-severity incidents (SEVs)
Peak support periods or escalations
Triage and troubleshoot application issues using existing runbooks and dashboards
Collaborate with Application Support and Engineering teams during incidents
Ensure all actions, findings, and resolutions are documented in ServiceNow (SNOW)
Requirements
Strong experience as a Site Reliability Engineer or Reliability Engineer
Deep hands-on expertise with **Grafana **(dashboards, alerting, troubleshooting)
Solid experience with monitoring and observability systems
Production experience operating **Kubernetes **environments
Experience supporting systems in **GCP **and on-prem environments (mandatory)
Strong **Linux **systems and troubleshooting skills
Fluent **English **(written and spoken).
Ability to work in** PST time zone.**
Ability to participate in an **on-call rotation **that includes coverage for one weekend day. Time worked during the weekend is compensated with one day off during the week, in accordance with the established work schedule.