ESO is a fast-paced, growing data, technology, and research company passionate about improving community health and safety through the power of data. As a Site Reliability Engineer, you will ensure the reliability, scalability, and performance of production systems while collaborating closely with engineering and infrastructure teams to troubleshoot issues and enhance automation.

Responsibilities:

Support and maintain production and non-production cloud environments (Cloud Azure/AWS)
Troubleshoot complex, distributed, cloud-based applications to identify root causes and implement durable fixes
Monitor system health, performance, and reliability using observability tools (e.g., New Relic, ELK and Zabbix)
Investigate application and database performance issues, including writing and optimizing SQL queries
Participate in incident response, debugging, and post-incident reviews focused on continuous improvement
Contribute to CI/CD pipelines (e.g., Azure DevOps) to improve automation, reliability, and deployment processes
Write and maintain automation scripts (PowerShell, bash, Python or similar) to streamline operational workflows
Collaborate with developers to understand code behavior and support troubleshooting efforts in C#/.NET-based systems
Help improve reliability standards, documentation, and operational best practices

Requirements:

Hands-on experience working in a cloud environment (Microsoft Azure strongly preferred)
Experience supporting and troubleshooting complex, cloud-native applications in production environments
Strong understanding of relational databases and solid experience writing and troubleshooting SQL queries
Ability to read and understand application code (preferably C#/.NET) to support debugging and issue resolution
Experience working with at least one CI/CD platform (e.g., Azure DevOps)
Familiarity with monitoring and observability tools (e.g., New Relic) and core concepts such as logs, metrics, and traces
Experience with scripting/automation (PowerShell preferred)
Strong analytical and problem-solving skills with attention to detail
Clear written and verbal communication skills
Experience working with Linux-based systems
Experience working with Kubernetes and container systems
Exposure to infrastructure-as-code tools (e.g., Terraform)
Familiarity with Git-based version control workflows

Site Reliability Engineer

Key skills

About this role

Responsibilities:

Requirements: