Own IAM platform reliability end‑to‑end: define SLOs/SLIs, error budgets, capacity plans, and resiliency roadmaps for core services (e.g., access administration, identity lifecycle, authentication, federation, PAM, directories)
Lead a 24×7 follow‑the‑sun operation: build and mature global on‑call rotations, incident response, and Major Incident Management (MIM) practices with clear RACI and runbooks
Engineer for resiliency: champion chaos testing, failure mode analysis, multi‑region/high‑availability patterns, DR/BCP validation, and automated health checks
Automate everything: drive “operations as code,” CI/CD for platform changes, immutable infrastructure, policy‑as‑code, automated compliance checks, and self‑service tooling for developers and operators
Manage risk & compliance: ensure controls alignment (e.g., SOX, FFIEC, GLBA, PCI where applicable), identity governance, separation of duties, and auditable change management
Elevate observability: standardize logging, metrics, tracing, and actionable alerting; implement service catalogs, golden signals, and error‑budget policies
Optimize cost & performance: track usage, right‑size capacity, and tune platform configurations while maintaining security and reliability objectives
Lead people & culture: attract, develop, and retain diverse SRE talent; set objectives, coach managers/ICs, and foster a blameless post‑incident culture with continuous learning
Partner & influence: work with product owners, security architects, enterprise architecture, and risk partners to align roadmaps and deliver high‑impact outcomes
Stakeholder communication: deliver clear status, metrics, and executive‑ready updates on risk, reliability, and remediation programs
Requirements
7+ years of Information Security Engineering experience, or equivalent demonstrated through one or a combination of the following: work experience, training, military experience, education
3+ years of management or leadership experience
Deep experience operating Cybersecurity platforms at scale or similar experience (e.g., identity lifecycle, authentication/federation, directory/PKI, privileged access, secrets, cyber defense)
Proven track record establishing SLOs/SLIs and error budgets, driving reliability improvements through automation and engineering
Experience managing cloud (AWS/Azure/GCP) and/or containerized workloads (Kubernetes), infrastructure as code (Terraform/CloudFormation), and CI/CD
Demonstrated leadership in Major Incident response, post incident reviews, and production change management/governance in regulated environments
Strong understanding of security controls, identity governance, least privilege, and regulatory/audit expectations
Excellent communication skills with the ability to influence senior executives and partner across Cybersecurity, Risk, and Engineering.
Tech Stack
AWS
Azure
Cloud
Cyber Security
Google Cloud Platform
Kubernetes
Terraform
Benefits
Health benefits
401(k) Plan
Paid time off
Disability benefits
Life insurance, critical illness insurance, and accident insurance