Accela provides cutting-edge technology that empowers government agencies to better serve their communities. As a Senior Manager in Cloud Engineering & Operations, you will lead a high-performing team focused on ensuring the reliability and performance of Accela’s Civic Platform Services while overseeing customer reliability engineering functions.
Responsibilities:
- Hire, develop, and retain a high-performing team of SREs and CREs
- Establish clear performance expectations, career paths, and technical standards
- Foster a culture of ownership, accountability, and continuous improvement
- Serve as a mentor to senior engineers and emerging leaders
- Own reliability outcomes across availability, performance, security, and compliance
- Serve as executive escalation point for high-severity production incidents
- Lead incident response strategy, root cause analysis (RCA), and corrective action planning
- Mature Incident, Problem, and Change Management processes
- Drive automation initiatives in partnership with DevOps, Security, and Database Engineering
- Oversee infrastructure scalability and resiliency across Microsoft Azure
- Champion Infrastructure as Code (Terraform) and configuration management (Ansible) standards
- Improve observability through robust dashboards, metrics, logging, and monitoring practices
- Partner with Product and Engineering to integrate reliability early in the PDLC
- Engage executive leadership with operational metrics, risk posture, and strategic roadmaps
- Manage vendor and partner relationships supporting SaaS production environments
- Align reliability strategy with business growth and public sector compliance requirements
Requirements:
- 10–12+ years of experience in software engineering and/or production systems engineering within a SaaS environment
- 3–5+ years of people leadership experience managing technical teams
- Proven track record of building, scaling, and retaining high-performing engineering teams
- Strong executive communication skills with experience presenting operational metrics and strategy
- Deep expertise in distributed systems, system design, and troubleshooting complex production environments
- Experience operating in Microsoft Azure environments
- Experience in Linux environments and software version control systems
- Strong scripting capability (Bash, Python, Ruby, or Go)
- Mastery of production monitoring, logging, and observability tools
- Demonstrated ability to lead full-stack incident response and root cause analysis
- Experience operating containerized platforms (AKS, ECS, or similar) at scale
- Azure CLI/API expertise
- Infrastructure as Code experience (Terraform preferred)
- Configuration management experience (Ansible preferred)
- PowerShell experience