General Dynamics Information Technology is a global technology and professional services company that delivers consulting, technology, and mission services to the U.S. government and defense sectors. They are seeking a Site Reliability Engineer (SRE) to ensure the resilience, performance, and reliability of mission-critical Defense systems by blending software engineering, automation, and operations expertise.
Responsibilities:
- Build/Design and maintain highly available, scalable systems across cloud and on‑prem environments
- Develop automation solutions that improves observability, speeds recovery, and eliminates manual operational work
- Implement monitoring, alerting, and performance tuning strategies that ensure system health
- Collaborate with development and infrastructure teams to design reliable architectures and CI/CD pipelines
- Conduct root cause analysis and drive systemic improvements to prevent future incidents
- Champion SRE best practices such as SLIs/SLOs, error budgets, and automated incident response
- Provide inputs into proposal operations in area of subject matter expertise, collaborating on solution elements and providing written narratives that describe technical solution elements designed for a specific opportunity
Requirements:
- 15 + years of related experience
- US Citizenship Required
- Candidate must possess active secret to start, and ability to attain Top Secret/SCI
- Work Experience: 15+ years in this space; system reliability, DevSecOps, cloud operations, or infrastructure engineering
- Education: Bachelor's with 15 years or an additional 4 years of work experience in lieu of degree
- Strong scripting and automation skills (Python, Bash, PowerShell, etc.)
- Hands‑on experience with monitoring tools (Prometheus, Grafana, Splunk, ELK, Datadog, etc.)
- Familiarity with Kubernetes, container orchestration, and modern CI/CD pipelines
- Understanding of networking, Linux system internals, and distributed systems
- Ability to troubleshoot complex technical issues across the stack
- Experience supporting DoD or other federal programs
- Certifications such as Kubernetes (CKA/CKAD), AWS/Azure, or ITIL
- Experience implementing SRE frameworks at scale