Zettabyte is building the infrastructure layer for an AI-first world, aiming to make AI compute ubiquitous and secure. They are seeking a Staff Security Engineer to define and own the security architecture of their multi-tenant GPU cloud platform, addressing unique security challenges and ensuring compliance while maintaining engineering velocity.
Responsibilities:
- Own the end-to-end security architecture for multi-tenant Kubernetes GPU clusters
- Design tenant isolation, egress control, and network segmentation across compute, storage, and networking layers
- Define and implement runtime security and intrusion detection for untrusted AI workloads
- Build security primitives (identity, secrets, encryption, policy enforcement) that platform teams build on
- Secure the software supply chain, from CI/CD pipelines to container admission
- Lead threat modeling and security design reviews for new platform features
- Drive compliance readiness (SOC 2, ISO 27001) without slowing engineering velocity
- Act as a force multiplier: unblock teams, set standards, and raise the security bar across the org
- Lead security incident response and turn incidents into systemic improvements
Requirements:
- 7+ years of experience in security engineering for cloud-native, infrastructure, or distributed systems
- Deep, hands-on expertise in Kubernetes security (RBAC, PSA, network policies, admission controllers)
- Strong understanding of cloud security primitives in AWS, GCP, or Azure
- Experience building or operating runtime security and policy enforcement (Falco, Cilium, OPA, Calico, eBPF-based tools)
- Solid grounding in network security and zero-trust architectures
- Practical experience with secrets management and key systems (Vault, cloud KMS)
- Strong automation skills in Go, Python, or Bash
- Proven ability to operate autonomously, make architectural decisions, and deliver in ambiguous environments
- Experience partnering deeply with platform, infra, and SRE teams
- GPU isolation and virtualization security (MIG, SR-IOV)
- InfiniBand, RDMA, or high-performance networking
- HPC or large-scale multi-tenant compute platforms
- Security for AI/ML systems or data-intensive workloads
- Incident response leadership or red team experience
- Security certifications (CKS, OSCP, CISSP)
- Open-source contributions in security or cloud infrastructure