Deepwatch is a leader in managed security services, dedicated to protecting organizations from cyber threats. The Manager of Site Reliability Engineering will lead and mentor a high-caliber SRE team, focusing on the architecture, automation, and reliability of secure cloud infrastructure.
Responsibilities:
- Lead and grow the SRE team, setting direction, mentoring and managing engineers, and fostering excellence
- Design and manage cloud and containerized infrastructure with IaC (Terraform)
- Implement robust CI/CD pipelines integrating security and compliance
- Build scalable observability systems, leading the definition of SLIs / SLOs and dashboards
- Manage incident response, root cause analysis, and postmortems; automate recovery via playbooks/runbooks
- Drive capacity planning, performance tuning, and cost efficiency
- Collaborate with InfoSec, DevSecOps, and Compliance teams—ensuring alignment with frameworks like FedRAMP, NIST, RMF
- Support program-level initiatives, communicating effectively with stakeholders
- Promote a culture of reliability, security, and developer efficiency
- Maintain an active 'player' role, dedicating approximately 75% of your time to hands-on engineering (design, coding, and architecture) and 25% to leadership, mentorship, and management
Requirements:
- 8+ years in SRE, DevOps, or Platform Engineering; with technical leadership experience ready to step into management as a player/coach
- Proven cloud experience (AWS, GCP) and container orchestration (Kubernetes, Docker)
- Strong coding/scripting (Python, GO) and proficiency in IaC and GitOps
- Deep knowledge of observability tools and defining reliability metrics
- Experienced in incident handling (PagerDuty, Datadog) and post-incident evaluations
- Demonstrated success in mentoring and developing junior/mid-level SRE talent, moving beyond delegation to hands-on technical coaching
- Familiarity with regulatory or cybersecurity frameworks (FedRAMP, NIST, STIGs, RMF)
- Excellent cross-functional communication and stakeholder management
- certifications such as AWS, CKA, or cyber security credentials (e.g., OSCP)