CentralSquare Technologies is a trusted provider of public sector software in North America, dedicated to supporting public servants with technology that makes communities safer. The Cloud Reliability Engineer will lead the architecture, design, and deployment of network solutions, ensuring the availability and reliability of systems that support AWS Cloud and hosted applications.
Responsibilities:
- Activities include designing, developing, installing, and maintaining software solutions that provide efficiency in Cloud Operations
- Work with engineering teams to refine deployment and release processes
- Collaborate with the engineering team on projects as the expert on reliability, performance, and efficiency
- Assist product engineers in development and deployment of backend applications
- Be prepared to explain your work, decisions, and ideas to your colleagues
- Participate in 24x7 operational support and on-call rotation shifts
- Ensure that all system design and procedures are documented and up-to-date
- Combine existing documentation where available, and create it where needed, to create a centralized body of knowledge for all team members to utilize
- Contribute to the upkeep of documentation to maintain relevancy and accuracy
- Provide training and education to Cloud Operations on infrastructure and internal tooling
- Provide level of audit and control to security personnel
- Monitor systems to collect metrics for tuning and capacity planning
- Work to automate detection and resolution of recurring issues
- Build the whole stack from load balancers to the databases
- Ensure safety, predictability, repeatability and auditability of all build and deploy processes
- Provide technical leadership to other CentralSquare departments
- Develop, coach, mentor individuals and teams and ensure high performance in a fast-paced environment
- Build tools and automation that eliminate repetitive tasks and prevent incident occurrence
Requirements:
- Bachelor's Degree (or equivalent experience) in computer engineering, computer science, engineering or information systems management
- Experience with AWS Cloud and Linux or UNIX software and systems
- Proficient in Java, Python, or C scripting
- Experience with Terraform, Agile, SaaS, PSQL, Cloud Architecture, and Javascript software and systems
- Comfortable with automation scripting and debugging
- Natural collaboration skills and an eye on continuous improvement
- Fluent in scalability and root cause analysis exercises
- A proactive approach to spotting problems, areas for improvement, and performance bottlenecks
- CJIS Clearance: A required part of the onboarding process for this role involves obtaining CJIS (Criminal Justice Information Services) clearance—a critical credential for safeguarding public safety data