Granicus is a company focused on transforming the Govtech industry by building technology that connects governments with their constituents. They are seeking a Site Reliability Engineer (SRE 2) to ensure the reliability and performance of their services while collaborating closely with software engineers and managing production support.
Responsibilities:
- Provide production support on a shift according to the team on-call roster
- Work on the customer and internal engineering/implementation team raised tickets while not on-call for production support
- Work on SREs backlog items
- Continuously monitor the health and performance of our services, systems, and infrastructure
- Respond to alerts and incidents promptly to ensure high availability
- Develop and maintain automation scripts and tools to streamline operations and reduce manual intervention
- Assist in troubleshooting and resolving incidents, performing root cause analysis, and implementing long-term fixes to prevent recurrence
- Participate in the design and implementation of system improvements to enhance reliability, scalability, and performance
- Work closely with software engineers to understand application requirements, provide feedback on design and architecture, and support deployment and release processes
- Create and maintain documentation for processes, procedures, and troubleshooting guides to ensure knowledge sharing within the team
- Assist in capacity planning activities to anticipate future needs and ensure that our infrastructure can handle growth
- Implement and adhere to security best practices to protect our systems and data
Requirements:
- Good understanding of networking and cloud services (Azure preferred, AWS, Google Cloud)
- Experience with scripting languages such as Powershell, Python, Bash, or Ruby
- Strong understanding of SQL databases and ability to read and write SQL
- Experience with monitoring and logging tools (e.g., Elastic), version control systems (e.g., Git), and CI/CD pipelines
- Strong analytical and problem-solving skills with a proactive approach to identifying and addressing issues
- Excellent verbal and written communication skills, with the ability to work effectively in a team environment
- Eagerness to learn new technologies and improve existing skills
- Bachelor's degree in computer science, Information Technology, or a related field, or equivalent practical experience
- Provide production support on a shift according to the team on-call roster
- Work on the customer and internal engineering/implementation team raised tickets while not on-call for production support
- Work on SREs backlog items
- Continuously monitor the health and performance of our services, systems, and infrastructure
- Respond to alerts and incidents promptly to ensure high availability
- Develop and maintain automation scripts and tools to streamline operations and reduce manual intervention
- Assist in troubleshooting and resolving incidents, performing root cause analysis, and implementing long-term fixes to prevent recurrence
- Participate in the design and implementation of system improvements to enhance reliability, scalability, and performance
- Work closely with software engineers to understand application requirements, provide feedback on design and architecture, and support deployment and release processes
- Create and maintain documentation for processes, procedures, and troubleshooting guides to ensure knowledge sharing within the team
- Assist in capacity planning activities to anticipate future needs and ensure that our infrastructure can handle growth
- Implement and adhere to security best practices to protect our systems and data
- Minimum Four years experience in a SRE, Devops and Production support
- Relevant certifications such as AWS Certified Solutions Architect, Google Cloud Professional DevOps Engineer, or similar