ESO is a fast-paced, growing data, technology, and research company passionate about improving community health and safety through the power of data. As a Site Reliability Engineer, you will ensure the reliability, scalability, and performance of production systems while collaborating with engineering and infrastructure teams to troubleshoot issues and enhance automation.
Responsibilities:
- Support and improve cloud-based production environments and applications
- Troubleshoot incidents, identify root causes, and implement long-term fixes
- Monitor system health, performance, and reliability using observability tools
- Investigate application and database performance issues, including writing and optimizing SQL queries
- Contribute to automation, CI/CD, and operational improvements
- Collaborate with development teams to support application and database troubleshooting
- Maintain documentation and support continuous improvement initiatives
Requirements:
- 5+ years of hands-on experience working cloud-based production environments and modern operational platforms
- Strong troubleshooting, analytical, and problem-solving skills
- Strong understanding of SQL, automation/scripting, and monitoring concepts
- Experience with CI/CD practices and modern operational tooling
- Clear written and verbal communication skills
- Experience with Linux, Kubernetes, Terraform, or Git-based workflows