ESO is a fast-paced, growing data, technology, and research company passionate about improving community health and safety through the power of data. As a Site Reliability Engineer, you will ensure the reliability, scalability, and performance of production systems while collaborating closely with engineering and infrastructure teams to troubleshoot issues and enhance automation.
Responsibilities:
- Support and maintain production and non-production cloud environments (Cloud Azure/AWS)
- Troubleshoot complex, distributed, cloud-based applications to identify root causes and implement durable fixes
- Monitor system health, performance, and reliability using observability tools (e.g., New Relic, ELK and Zabbix)
- Investigate application and database performance issues, including writing and optimizing SQL queries
- Participate in incident response, debugging, and post-incident reviews focused on continuous improvement
- Contribute to CI/CD pipelines (e.g., Azure DevOps) to improve automation, reliability, and deployment processes
- Write and maintain automation scripts (PowerShell, bash, Python or similar) to streamline operational workflows
- Collaborate with developers to understand code behavior and support troubleshooting efforts in C#/.NET-based systems
- Help improve reliability standards, documentation, and operational best practices
Requirements:
- Hands-on experience working in a cloud environment (Microsoft Azure strongly preferred)
- Experience supporting and troubleshooting complex, cloud-native applications in production environments
- Strong understanding of relational databases and solid experience writing and troubleshooting SQL queries
- Ability to read and understand application code (preferably C#/.NET) to support debugging and issue resolution
- Experience working with at least one CI/CD platform (e.g., Azure DevOps)
- Familiarity with monitoring and observability tools (e.g., New Relic) and core concepts such as logs, metrics, and traces
- Experience with scripting/automation (PowerShell preferred)
- Strong analytical and problem-solving skills with attention to detail
- Clear written and verbal communication skills
- Experience working with Linux-based systems
- Experience working with Kubernetes and container systems
- Exposure to infrastructure-as-code tools (e.g., Terraform)
- Familiarity with Git-based version control workflows