Leidos is an industry and technology leader serving government and commercial customers. They are seeking a Site Reliability Automation and Orchestration Engineer to provide engineering support for the U.S. Navy’s Service Management, Integration, and Transport program, focusing on automating system operations and orchestrating network operations.
Responsibilities:
- Provide continuous development and continuous integration to include site reliability engineering and integration, release management, implementation & migration, and training & knowledge transfer
- Automate routine tasks and software updates, document, and maintain functional, integration, security, and load/stress testing procedureDocument and deliver an approach for automated acceptance testing, integration/regression testing, all new applications, and all new capabilitiesDevelop Ansible playbook design, role development, and testing, in non-production environments, as well as migration to production networks
- Participate in the development of the release management process, procedures, and policiesEstablish, manage, update, and maintain the overall Release Management Plan and Release ScheduleConduct site surveys, as necessary, to assess existing equipment and software being used to validate release package requirements and dependencies
- Develop, document, and maintain work instruction materials for related Automation and Orchestration processesProvide training when substantive technological changes are introduced
Requirements:
- Secret Clearance
- IAT Level II Baseline Certification (e.g. CCNA Security, CySA+, GICSP, GSEC, Security+ CE, CND, SSCP)
- BS and 8+ years of hands-on site reliability engineering automation, ideally with federal government; add'l experience may be considered in lieu of degree
- Experience working in a DevOps, Continuous Delivery (CICD), and Agile environments using code delivery mechanisms, continuous build systems, code repositories, and continuous delivery solutions
- Must be a US Citizen
- Must be able to support program execution in classified environments and access SIPRNet from an NMCI location on short notice (local travel)
- Experience with automated script design, coding, debugging, and maintenance skills (using bash, python, etc.) preferred
- Experience in CI/CD toolsets (e.g. Jenkins, GitLab, etc.)
- Experience with Containerization (Docker) and Container Orchestration (Kubernetes)
- Experience with chaos engineering practices and tools such as Chaos Monkey, Gremlin, or similar frameworks
- Good command of Linux/Unix and command line knowledge
- Experience in application administration, configuration, and integration
- Familiarity with agile development methodologies
- Skilled and disciplined to work with a distributed team
- Ability to work in a highly collaborative, forward thinking, and innovation-driven environment
- Knowledge of Agile and DevSecOps/SRE concepts and best practices, with a desire to grow that knowledge
- Hand-on experience with Atlassian products (Jira, Confluence, Bitbucket, etc.)
- Experience creating JIRA and/or Azure DevOps workflows, projects, custom configurations
- Experience administrating/maintaining SRE platform via Ansible playbooks (e.g. upgrading Jenkins)
- Experience in automating tasks with scripting languages like PowerShell, or Python
- Integrating/maintaining with various 3rd party CI/CD tools like Jenkins and Gitlab
- Experience with PaaS using Red Hat OpenShift/Kubernetes and Docker containers
- Experience with commercial cloud infrastructure deployment environments such as AWS and Azure
- Experience with automated provisioning and configuration tools like Terraform, Cloud Formation, Chef, Puppet, Ansible, or similar technologies
- Working knowledge of the Risk Management Framework (RMF), DISA STIGs
- Experience with Infrastructure as Code (IaC) tools such as Terraform, Ansible, or CloudFormation for automating test environments
- ITILv4, Scrum Master, or Agile SAFe certification(s) or applicable experience