SentinelOne is a company at the intersection of AI and security, pioneering a new operating model for cybersecurity. They are seeking a Senior Site Reliability Engineer to join their Government SRE team, responsible for ensuring the technical reliability of government environments and coordinating compliant deployments. The role involves collaborating with cross-functional teams to lead best practices for cloud infrastructure and continuous delivery in regulated environments.
Responsibilities:
- Drive continuous software delivery, resolve incidents, run post mortems, and create automation strategies for deployment, self-testing, and alerting
- Lead and execute incident management for production issues, ensuring rapid recovery, root cause analysis, and preventative follow-up actions
- Improve and optimize the observability strategy by collaborating with application engineering teams to design monitoring solutions that enhance alerting capabilities and reduce noise
- Define, implement, and monitor SLOs, SLIs, and SLAs in collaboration with product and engineering teams to align with business objectives
- Design, develop, and maintain software solutions that address operational, compliance, and pipeline challenges
- Own and coordinate all government environment releases, driving process improvements to enhance the release pipeline's efficiency, reliability, and visibility. Understand product architecture and service dependencies to manage risk and implement effective testing strategies
- Partner cross-functionally with engineering, product, SecOps, compliance, and leadership teams to align priorities, define testing strategies, and resolve challenges
- Ensure all infrastructure and deployments meet FedRAMP, government regulations, and industry standards, while maintaining required release documentation and risk assessments
Requirements:
- 5+ years of experience in SRE, DevOps, or Infrastructure Engineering for SaaS products, with 4+ years running operations at a large scale
- 2+ years of production experience with a container orchestration system (Kubernetes preferred) and Continuous Delivery
- Strong understanding of compliance frameworks relevant to government deployments (e.g., FedRAMP, DoD, NIST 800 53, NIST 800 137)
- Multi cloud experience in AWS/GCP (expertise within AWS preferred)
- Demonstrated experience with at least one main programming language (Python, Go, Ruby, etc.) and proficiency in bash scripting to improve operational workflows
- Familiarity with GitOps frameworks, IaC tooling (Terraform or Pulumi), and deployment strategies (blue green, rolling deploys, canary deploys)
- Experience with industry standard observability stacks (Prometheus, Grafana, ELK, OpenTelemetry, etc.) and incident management processes
- Proven background implementing and supporting FedRAMP, security, risk management, and compliance processes for software releases
- Experience working directly with government agencies or in highly regulated industries
- Familiarity with testing strategies and automation in large scale environments
- Due to Federal Government contract requirements, U.S. Citizenship and a work location in the United States is required