i4DM is a company that provides Federal agencies with skilled professionals to tackle complex mission challenges. They are seeking a DevSecOps and Site Reliability Engineering (SRE) Technical Director to lead technical authority in DevSecOps practices and platform reliability engineering for VA enterprise healthcare applications.
Responsibilities:
- Serve as the senior technical authority for DevSecOps and Site Reliability Engineering (SRE) practices across platform services and hosted applications
- Establish and enforce engineering standards, DevSecOps practices, and reliability frameworks aligned with enterprise architecture and VA requirements
- Provide technical leadership and mentorship to DevSecOps and SRE engineering teams
- Oversee design, implementation, and maintenance of CI/CD pipelines supporting secure, automated, and repeatable application delivery
- Drive automation of build, test, deployment, and infrastructure provisioning processes using Infrastructure as Code (IaC)
- Ensure pipelines include automated security testing, quality validation, and compliance controls throughout the software delivery lifecycle
- Lead efforts to improve system reliability, scalability, performance, and operational efficiency across mission-critical environments
- Define and monitor reliability metrics (e.g., availability, latency, deployment success rates) and drive improvements in service stability
- Reduce operational toil through automation and proactive system improvements
- Support and guide adoption of cloud-native architectures, containerized environments (e.g., Kubernetes), and platform modernization initiatives
- Ensure infrastructure is scalable, resilient, and aligned with Federal cloud standards and best practices
- Drive continuous improvement of platform capabilities to support evolving healthcare application needs
- Ensure DevSecOps practices align with Federal security requirements, including NIST, Zero Trust principles, and VA cybersecurity policies
- Collaborate with cybersecurity teams to implement secure-by-design practices across infrastructure and application pipelines
- Support vulnerability management, secure configuration enforcement, and compliance validation within DevSecOps workflows
- Coordinate with Program Management, Engineering, Architecture, Monitoring, and Incident Management teams to ensure seamless integration of development and operations
- Partner with SRE, operations, and monitoring teams to improve observability, incident response, and system resilience
- Align DevSecOps and SRE practices with Agile and SAFe delivery methodologies to support continuous delivery
- Support incident response activities, including root cause analysis, remediation planning, and implementation of corrective actions
- Identify recurring issues and system weaknesses, driving improvements in reliability and deployment practices
- Continuously evaluate and enhance engineering processes, tools, and automation frameworks to improve efficiency and system performance
Requirements:
- Bachelor's degree in Computer Science, Engineering, Information Technology, or a related field
- 8+ years of experience in DevSecOps, Site Reliability Engineering (SRE), or platform engineering roles supporting enterprise or mission-critical environments
- Strong hands-on experience with cloud platforms (AWS preferred), Infrastructure as Code tools (e.g., Terraform), and configuration management tools (e.g., Ansible)
- Experience with container orchestration platforms (e.g., Kubernetes, EKS, ECS) and cloud-native application architectures
- Proven leadership experience guiding engineering teams within Agile or SAFe environments
- Experience supporting CI/CD pipelines, automation frameworks, and modern software delivery practices
- Strong understanding of system reliability, monitoring, and performance optimization principles
- Ability to collaborate across cross-functional teams in high-availability, 24x7 operational environments
- Experience scaling SRE practices across large, complex, multi-region cloud environments
- Candidates must be eligible to obtain and maintain a Public Trust clearance
- Experience supporting VA or Federal Government environments, including compliance with Federal cloud and security policies
- Experience implementing Zero Trust security principles within DevSecOps pipelines and cloud-native architectures
- Familiarity with observability tools, AIOps, and automation-driven reliability engineering practices
- Experience supporting large-scale enterprise modernization initiatives or healthcare application platforms
- SAFe, DevSecOps, or cloud-related certifications