Leidos is an industry and technology leader serving government and commercial customers with smarter, more efficient digital and mission innovations. The Site Reliability Automation and Orchestration Engineer will provide engineering support for the U.S. Navy’s Service Management, Integration, and Transport program, focusing on automating operations and orchestrating network processes.
Responsibilities:
- Provide Continuous Development and Continuous Integration
- Automate routine tasks and software updates; document and maintain functional, integration, security, and load/stress testing procedures
- Document and deliver an approach for automated acceptance testing and integration/regression testing for all new applications and capabilities
- Develop Ansible playbook design, role development, and testing in non-production environments, as well as migration to production networks
- Participate in the development of release management processes, procedures, and policies
- Establish, manage, update, and maintain the overall Release Management Plan and Release Schedule
- Conduct site surveys, as necessary, to assess existing equipment and software used to validate release package requirements and dependencies
- Develop, document, and maintain work instruction materials for Automation and Orchestration processes
- Provide training when substantive technological changes are introduced
Requirements:
- Bachelor's degree and 4–8 years of hands-on site reliability engineering automation experience, ideally supporting the federal government. Additional experience may be considered in lieu of a degree
- IAT Level II Baseline Certification (e.g., CCNA Security, CySA+, GICSP, GSEC, Security+ CE, CND, or SSCP)
- Experience working in DevOps, Continuous Delivery (CI/CD), and Agile environments using code delivery mechanisms, continuous build systems, code repositories, and continuous delivery solutions
- Active Secret Clearance
- Must be a U.S. Citizen
- Must be able to support program execution in classified environments and access SIPRNet from an NMCI location on short notice (local travel required)
- Experience with automated script design, coding, debugging, and maintenance using Bash, Python, or similar scripting languages preferred
- Experience with CI/CD toolsets such as Jenkins and GitLab
- Experience with containerization (Docker) and container orchestration (Kubernetes)
- Experience with chaos engineering practices and tools such as Chaos Monkey, Gremlin, or similar frameworks
- Strong Linux/Unix and command-line knowledge
- Experience in application administration, configuration, and integration
- Familiarity with Agile development methodologies
- Ability to work effectively with distributed teams
- Ability to work in a highly collaborative, forward-thinking, and innovation-driven environment
- Knowledge of Agile and DevSecOps/SRE concepts and best practices, with a desire to continue growing that knowledge
- Hands-on experience with Atlassian products (Jira, Confluence, Bitbucket, etc.)
- Experience creating Jira and/or Azure DevOps workflows, projects, and custom configurations
- Experience administering and maintaining SRE platforms via Ansible playbooks (e.g., upgrading Jenkins)
- Experience automating tasks with scripting languages such as PowerShell or Python
- Experience integrating and maintaining third-party CI/CD tools such as Jenkins and GitLab
- Experience with PaaS environments using Red Hat OpenShift/Kubernetes and Docker containers
- Experience with commercial cloud infrastructure deployment environments such as AWS and Azure
- Experience with automated provisioning and configuration tools such as Terraform, CloudFormation, Chef, Puppet, Ansible, or similar technologies
- Working knowledge of the Risk Management Framework (RMF) and DISA STIGs
- Experience with Infrastructure as Code (IaC) tools such as Terraform, Ansible, or CloudFormation for automating test environments
- ITILv4, Scrum Master, or SAFe Agile certifications, or equivalent applicable experience