ESO is a fast-paced, growing data, technology, and research company passionate about improving community health and safety through the power of data. As a Site Reliability Engineer, you will ensure the reliability, scalability, and performance of production systems while collaborating with engineering and infrastructure teams to troubleshoot issues and enhance automation.

Responsibilities:

Support and improve cloud-based production environments and applications
Troubleshoot incidents, identify root causes, and implement long-term fixes
Monitor system health, performance, and reliability using observability tools
Investigate application and database performance issues, including writing and optimizing SQL queries
Contribute to automation, CI/CD, and operational improvements
Collaborate with development teams to support application and database troubleshooting
Maintain documentation and support continuous improvement initiatives

Requirements:

5+ years of hands-on experience working cloud-based production environments and modern operational platforms
Strong troubleshooting, analytical, and problem-solving skills
Strong understanding of SQL, automation/scripting, and monitoring concepts
Experience with CI/CD practices and modern operational tooling
Clear written and verbal communication skills
Experience with Linux, Kubernetes, Terraform, or Git-based workflows

Site Reliability Engineer

Key skills

About this role

Responsibilities:

Requirements: