About this role

Helps lead projects that are focused on Disaster Recovery, managing and maintaining optimum platform infrastructure performance, reliability, and security using SRE practices, observability tools, manual and automated procedures
Develops complex services to automate monitoring activities and provide critical information to facilitate response and resolution of performance and availability issues and incidents
Troubleshoots and analyzes service disruptions to determine the root cause of issues and develop solutions for improved reliability
Assists application development teams create a Disaster Recovery playbook
Leads more complex projects focused on building and maintaining observability/monitoring for the application, monitoring key performance indicators
Communicates complex topics with development teams to investigate and document issues and leads internal team to develop solutions to mitigate them

A Bachelor's degree in a quantitative or business field (e.g., statistics, mathematics, engineering, computer science)
Requires 4 – 6 years of related experience
One or more of the following skills are desired: Disaster Recovery, AWS, SQL, MongoDB

Senior Site Reliability Engineer

Key skills