Leidos is a leading company focused on delivering innovative solutions, and they are seeking a Site Reliability Engineer (SRE) to contribute to their team. The role involves designing and implementing CI/CD pipelines, automating infrastructure, and collaborating with Agile software teams to ensure reliable and efficient software delivery.
Responsibilities:
- Design, develop, troubleshoot, and maintain mission-critical infrastructure across cloud and on-premises environments using infrastructure-as-code (IaC)
- Build and support scalable, highly available, and secure cloud-native architectures, including Kubernetes clusters and microservices deployments
- Enable and optimize CI/CD pipelines by applying best practices for automated provisioning, configuration, testing, and deployment
- Gather and analyze system and application metrics to support performance tuning, capacity planning, and proactive issue resolution
- Partner with development teams to improve system reliability through rigorous testing, release processes, and continuous improvement initiatives
- Participate in system design, platform engineering, and technical decision-making to ensure solutions meet functional, performance, and SLA requirements
- Collaborate across engineering teams and stakeholders to deliver solutions, resolve technical challenges, and coordinate key deliverables
- Develop prototypes, perform technical spikes, and evaluate new tools or approaches to solve complex technical problems
- Continuously assess deployed systems and implement improvements to enhance reliability, scalability, and operational efficiency
- Mentor team members and contribute to knowledge sharing across the organization
Requirements:
- Bachelor's degree in Computer Science, Computer Engineering, or a related field, with 4+ years of relevant experience
- Demonstrated ability to deliver projects or processes spanning multiple technical domains, including experience in a technical lead capacity
- Solid understanding of Agile development practices, along with CI/CD methodologies and supporting tools
- Strong proficiency with Linux and Windows operating systems, as well as networking fundamentals (e.g., HTTP, HTTPS, SSL/TLS, SMTP, DNS)
- Hands-on experience provisioning and managing resources within cloud and IaaS environments (AWS, Azure, Google Cloud Platform, etc.)
- Practical experience with infrastructure-as-code and automation tools such as Terraform, Ansible, CloudFormation, Chef, or Puppet
- Experience working with container technologies (Docker) and orchestration platforms like Kubernetes, including use of kubectl
- Proficiency with version control systems, such as Git
- Demonstrated curiosity and initiative in learning new tools, frameworks, and technologies
- Ability to work independently with minimal supervision while also collaborating effectively within cross-functional engineering teams
- Experience with enterprise event streaming technologies such as Kafka or NATS
- Familiarity with monitoring and observability tools like Grafana and Prometheus
- Exposure to service mesh and API gateway technologies (e.g., Istio)
- Experience with GitOps tools such as Argo CD, Flux CD, or similar platforms
- Professional cybersecurity certification (e.g., Security+ or equivalent)
- Understanding of Agile development methodologies and practices
- Working knowledge of relational database systems such as Oracle, MySQL, PostgreSQL, or SQL Server