AIS is a mission-driven company focused on making a difference through innovative projects. They are seeking a Reliability Engineer to ensure the availability and performance of cloud platforms and services while managing incidents and compliance with production requirements.
Responsibilities:
- Responsible for the availability, performance, monitoring, and incident response, among other things, of the cloud platforms and services
- Ensure that everything that goes to production complies with a set of general requirements like diagrams, dependencies of other services, monitoring and logging plans, backups and possible high availability setups
- Manages uncaught exceptions, hardware degradation, networking problems, high usage of resources, or slow responses that could happen at any time
- Uses metrics such as mean time to recover (MTTR) and mean time to failure (MTTF)
- Considered an emerging authority, who applies extensive technical expertise
- Develops technical solutions to complex problems
- Exercises considerable latitude in determining objectives and approaches to assignment
Requirements:
- Bachelors degree in Computer Science, Information Systems, Engineering, or related field (or equivalent experience)
- 8+ years of relevant experience supporting enterprise cloud and/or infrastructure environments
- Certifications: IAT-2, 1 or more cloud certifications
- Active Secret clearance (or higher)
- Experience working in regulated environments and following secure engineering / documentation practices
- Experience supporting DoD/IC programs and mission systems