Helps lead projects that are focused on Disaster Recovery, managing and maintaining optimum platform infrastructure performance, reliability, and security using SRE practices, observability tools, manual and automated procedures
Develops complex services to automate monitoring activities and provide critical information to facilitate response and resolution of performance and availability issues and incidents
Troubleshoots and analyzes service disruptions to determine the root cause of issues and develop solutions for improved reliability
Assists application development teams create a Disaster Recovery playbook
Leads more complex projects focused on building and maintaining observability/monitoring for the application, monitoring key performance indicators
Communicates complex topics with development teams to investigate and document issues and leads internal team to develop solutions to mitigate them
Requirements
A Bachelor's degree in a quantitative or business field (e.g., statistics, mathematics, engineering, computer science)
Requires 4 – 6 years of related experience
One or more of the following skills are desired: Disaster Recovery, AWS, SQL, MongoDB
Tech Stack
AWS
MongoDB
SQL
Benefits
competitive pay
health insurance
401K and stock purchase plans
tuition reimbursement
paid time off plus holidays
flexible approach to work with remote, hybrid, field or office work schedules