This role supports the U.S. Air Force Cloud One Architecture and Common Shared Services contract and currently has an opening for a Reliability Engineer.
The Reliability Engineer is responsible for ensuring the availability, performance, scalability, and resiliency of mission‑critical systems.
This role applies software engineering principles to infrastructure and operations, with a strong emphasis on automation, monitoring, incident response, and continuous reliability improvement.
The reliability engineer serves as the bridge between development, operations, and platform teams to ensure production systems consistently meet defined service level objectives (SLOs) while supporting rapid, safe delivery of new capabilities.
Requirements
Bachelors and eight (8) years or more of experience; Masters and six (6) years or more of experience. Additional experience may be accepted in lieu of degree.
Active Secret clearance at a minimum required to start
US citizenship required
Experience with cloud platforms (AWS, Azure, OCI, or GCP), including managed services
Experience with containerized environments (Docker, Kubernetes)
Familiarity with CI/CD pipelines and deployment automation
SLOs and error budgets
Capacity modeling and performance testing
Strong understanding of:
Distributed systems and high‑availability architectures