Leidos is a prominent company providing engineering support to the Navy Marine Corps Intranet (NMCI). They are seeking a Site Reliability Automation and Orchestration Engineer to enhance their operations through automation, integration, and release management in a high-visibility DoD program.
Responsibilities:
- Provide continuous development and continuous integration to include site reliability engineering and integration, release management, implementation & migration, and training & knowledge transfer
- Automate routine tasks and software updates, document, and maintain functional, integration, security, and load/stress testing procedure
- Document and deliver an approach for automated acceptance testing, integration/regression testing, all new applications, and all new capabilities
- Develop Ansible playbook design, role development, and testing, in non-production environments, as well as migration to production networks
- Participate in the development of the release management process, procedures, and policies
- Establish, manage, update, and maintain the overall Release Management Plan and Release Schedule
- Conduct site surveys, as necessary, to assess existing equipment and software being used to validate release package requirements and dependencies
- Develop, document, and maintain work instruction materials for related Automation and Orchestration processes
- Provide training when substantive technological changes are introduced
Requirements:
- Secret Clearance
- IAT Level II Baseline Certification (e.g. CCNA Security, CySA+, GICSP, GSEC, Security+ CE, CND, SSCP)
- BS and 8+ years of hands-on site reliability engineering automation, ideally with federal government; add'l experience may be considered in lieu of degree
- Experience working in a DevOps, Continuous Delivery (CICD), and Agile environments using code delivery mechanisms, continuous build systems, code repositories, and continuous delivery solutions
- Must be a US Citizen
- Must be able to support program execution in classified environments and access SIPRNet from an NMCI location on short notice (local travel)
- Experience with automated script design, coding, debugging, and maintenance skills (using bash, python, etc.) preferred
- Experience in CI/CD toolsets (e.g. Jenkins, GitLab, etc.)
- Experience with Containerization (Docker) and Container Orchestration (Kubernetes)
- Experience with chaos engineering practices and tools such as Chaos Monkey, Gremlin, or similar frameworks
- Good command of Linux/Unix and command line knowledge
- Experience in application administration, configuration, and integration
- Familiarity with agile development methodologies
- Skilled and disciplined to work with a distributed team
- Ability to work in a highly collaborative, forward thinking, and innovation-driven environment
- Knowledge of Agile and DevSecOps/SRE concepts and best practices, with a desire to grow that knowledge
- Hand-on experience with Atlassian products (Jira, Confluence, Bitbucket, etc.)
- Experience creating JIRA and/or Azure DevOps workflows, projects, custom configurations
- Experience administrating/maintaining SRE platform via Ansible playbooks (e.g. upgrading Jenkins)
- Experience in automating tasks with scripting languages like PowerShell, or Python
- Integrating/maintaining with various 3rd party CI/CD tools like Jenkins and Gitlab
- Experience with PaaS using Red Hat OpenShift/Kubernetes and Docker containers
- Experience with commercial cloud infrastructure deployment environments such as AWS and Azure
- Experience with automated provisioning and configuration tools like Terraform, Cloud Formation, Chef, Puppet, Ansible, or similar technologies
- Working knowledge of the Risk Management Framework (RMF), DISA STIGs
- Experience with Infrastructure as Code (IaC) tools such as Terraform, Ansible, or CloudFormation for automating test environments
- ITILv4, Scrum Master, or Agile SAFe certification(s) or applicable experience