Leidos is a leading company focused on delivering innovative solutions, and they are seeking a Site Reliability Engineer (SRE) to contribute to their team. The role involves designing and implementing CI/CD pipelines, automating infrastructure, and collaborating with Agile software teams to ensure reliable and efficient software delivery.

Responsibilities:

Design, develop, troubleshoot, and maintain mission-critical infrastructure across cloud and on-premises environments using infrastructure-as-code (IaC)
Build and support scalable, highly available, and secure cloud-native architectures, including Kubernetes clusters and microservices deployments
Enable and optimize CI/CD pipelines by applying best practices for automated provisioning, configuration, testing, and deployment
Gather and analyze system and application metrics to support performance tuning, capacity planning, and proactive issue resolution
Partner with development teams to improve system reliability through rigorous testing, release processes, and continuous improvement initiatives
Participate in system design, platform engineering, and technical decision-making to ensure solutions meet functional, performance, and SLA requirements
Collaborate across engineering teams and stakeholders to deliver solutions, resolve technical challenges, and coordinate key deliverables
Develop prototypes, perform technical spikes, and evaluate new tools or approaches to solve complex technical problems
Continuously assess deployed systems and implement improvements to enhance reliability, scalability, and operational efficiency
Mentor team members and contribute to knowledge sharing across the organization

Requirements:

Bachelor's degree in Computer Science, Computer Engineering, or a related field, with 4+ years of relevant experience
Demonstrated ability to deliver projects or processes spanning multiple technical domains, including experience in a technical lead capacity
Solid understanding of Agile development practices, along with CI/CD methodologies and supporting tools
Strong proficiency with Linux and Windows operating systems, as well as networking fundamentals (e.g., HTTP, HTTPS, SSL/TLS, SMTP, DNS)
Hands-on experience provisioning and managing resources within cloud and IaaS environments (AWS, Azure, Google Cloud Platform, etc.)
Practical experience with infrastructure-as-code and automation tools such as Terraform, Ansible, CloudFormation, Chef, or Puppet
Experience working with container technologies (Docker) and orchestration platforms like Kubernetes, including use of kubectl
Proficiency with version control systems, such as Git
Demonstrated curiosity and initiative in learning new tools, frameworks, and technologies
Ability to work independently with minimal supervision while also collaborating effectively within cross-functional engineering teams
Experience with enterprise event streaming technologies such as Kafka or NATS
Familiarity with monitoring and observability tools like Grafana and Prometheus
Exposure to service mesh and API gateway technologies (e.g., Istio)
Experience with GitOps tools such as Argo CD, Flux CD, or similar platforms
Professional cybersecurity certification (e.g., Security+ or equivalent)
Understanding of Agile development methodologies and practices
Working knowledge of relational database systems such as Oracle, MySQL, PostgreSQL, or SQL Server

Site Reliability Engineer

Key skills

About this role

Responsibilities:

Requirements: