Granicus is a company focused on transforming the Govtech industry by building technology that connects governments with their constituents. They are seeking a Site Reliability Engineer (SRE 2) to ensure the reliability and performance of their services while collaborating closely with software engineers and managing production support.

Responsibilities:

Provide production support on a shift according to the team on-call roster
Work on the customer and internal engineering/implementation team raised tickets while not on-call for production support
Work on SREs backlog items
Continuously monitor the health and performance of our services, systems, and infrastructure
Respond to alerts and incidents promptly to ensure high availability
Develop and maintain automation scripts and tools to streamline operations and reduce manual intervention
Assist in troubleshooting and resolving incidents, performing root cause analysis, and implementing long-term fixes to prevent recurrence
Participate in the design and implementation of system improvements to enhance reliability, scalability, and performance
Work closely with software engineers to understand application requirements, provide feedback on design and architecture, and support deployment and release processes
Create and maintain documentation for processes, procedures, and troubleshooting guides to ensure knowledge sharing within the team
Assist in capacity planning activities to anticipate future needs and ensure that our infrastructure can handle growth
Implement and adhere to security best practices to protect our systems and data

Requirements:

Good understanding of networking and cloud services (Azure preferred, AWS, Google Cloud)
Experience with scripting languages such as Powershell, Python, Bash, or Ruby
Strong understanding of SQL databases and ability to read and write SQL
Experience with monitoring and logging tools (e.g., Elastic), version control systems (e.g., Git), and CI/CD pipelines
Strong analytical and problem-solving skills with a proactive approach to identifying and addressing issues
Excellent verbal and written communication skills, with the ability to work effectively in a team environment
Eagerness to learn new technologies and improve existing skills
Bachelor's degree in computer science, Information Technology, or a related field, or equivalent practical experience
Provide production support on a shift according to the team on-call roster
Work on the customer and internal engineering/implementation team raised tickets while not on-call for production support
Work on SREs backlog items
Continuously monitor the health and performance of our services, systems, and infrastructure
Respond to alerts and incidents promptly to ensure high availability
Develop and maintain automation scripts and tools to streamline operations and reduce manual intervention
Assist in troubleshooting and resolving incidents, performing root cause analysis, and implementing long-term fixes to prevent recurrence
Participate in the design and implementation of system improvements to enhance reliability, scalability, and performance
Work closely with software engineers to understand application requirements, provide feedback on design and architecture, and support deployment and release processes
Create and maintain documentation for processes, procedures, and troubleshooting guides to ensure knowledge sharing within the team
Assist in capacity planning activities to anticipate future needs and ensure that our infrastructure can handle growth
Implement and adhere to security best practices to protect our systems and data
Minimum Four years experience in a SRE, Devops and Production support
Relevant certifications such as AWS Certified Solutions Architect, Google Cloud Professional DevOps Engineer, or similar

Site Reliability Engineer 2

Key skills

About this role

Responsibilities:

Requirements: