Leidos is a prominent company providing engineering support to the Navy Marine Corps Intranet (NMCI). They are seeking a Site Reliability Automation and Orchestration Engineer to enhance their operations through automation, integration, and release management in a high-visibility DoD program.

Responsibilities:

Provide continuous development and continuous integration to include site reliability engineering and integration, release management, implementation & migration, and training & knowledge transfer
Automate routine tasks and software updates, document, and maintain functional, integration, security, and load/stress testing procedure
Document and deliver an approach for automated acceptance testing, integration/regression testing, all new applications, and all new capabilities
Develop Ansible playbook design, role development, and testing, in non-production environments, as well as migration to production networks
Participate in the development of the release management process, procedures, and policies
Establish, manage, update, and maintain the overall Release Management Plan and Release Schedule
Conduct site surveys, as necessary, to assess existing equipment and software being used to validate release package requirements and dependencies
Develop, document, and maintain work instruction materials for related Automation and Orchestration processes
Provide training when substantive technological changes are introduced

Requirements:

Secret Clearance
IAT Level II Baseline Certification (e.g. CCNA Security, CySA+, GICSP, GSEC, Security+ CE, CND, SSCP)
BS and 8+ years of hands-on site reliability engineering automation, ideally with federal government; add'l experience may be considered in lieu of degree
Experience working in a DevOps, Continuous Delivery (CICD), and Agile environments using code delivery mechanisms, continuous build systems, code repositories, and continuous delivery solutions
Must be a US Citizen
Must be able to support program execution in classified environments and access SIPRNet from an NMCI location on short notice (local travel)
Experience with automated script design, coding, debugging, and maintenance skills (using bash, python, etc.) preferred
Experience in CI/CD toolsets (e.g. Jenkins, GitLab, etc.)
Experience with Containerization (Docker) and Container Orchestration (Kubernetes)
Experience with chaos engineering practices and tools such as Chaos Monkey, Gremlin, or similar frameworks
Good command of Linux/Unix and command line knowledge
Experience in application administration, configuration, and integration
Familiarity with agile development methodologies
Skilled and disciplined to work with a distributed team
Ability to work in a highly collaborative, forward thinking, and innovation-driven environment
Knowledge of Agile and DevSecOps/SRE concepts and best practices, with a desire to grow that knowledge
Hand-on experience with Atlassian products (Jira, Confluence, Bitbucket, etc.)
Experience creating JIRA and/or Azure DevOps workflows, projects, custom configurations
Experience administrating/maintaining SRE platform via Ansible playbooks (e.g. upgrading Jenkins)
Experience in automating tasks with scripting languages like PowerShell, or Python
Integrating/maintaining with various 3rd party CI/CD tools like Jenkins and Gitlab
Experience with PaaS using Red Hat OpenShift/Kubernetes and Docker containers
Experience with commercial cloud infrastructure deployment environments such as AWS and Azure
Experience with automated provisioning and configuration tools like Terraform, Cloud Formation, Chef, Puppet, Ansible, or similar technologies
Working knowledge of the Risk Management Framework (RMF), DISA STIGs
Experience with Infrastructure as Code (IaC) tools such as Terraform, Ansible, or CloudFormation for automating test environments
ITILv4, Scrum Master, or Agile SAFe certification(s) or applicable experience

Site Reliability Engineeing (SRE) Automation Engineer

Key skills

About this role

Responsibilities:

Requirements: