Design, build, and maintain scalable, reliable cloud infrastructure on Google Cloud Platform
Own and evolve our Kubernetes-based development and production environments
Develop and improve CI/CD pipelines that enable fast, safe, and repeatable deployments
Build and operate modern observability practices (monitoring, alerting, logging, incident response)
Proactively identify reliability risks and drive improvements before incidents happen
Expand and optimize global production deployments in a complex compliance environment
Automate manual and repetitive processes to reduce toil and human error
Ensure security, resilience, and compliance are built into everything we deploy
Mentor engineers and teams on SRE and DevOps best practices
Contribute to a strong DevOps culture where ownership and collaboration are the default
Requirements
5+ years of experience in SRE, DevOps, or similar infrastructure-focused roles
Hands-on experience with cloud environments (GCP preferred, AWS or Azure also relevant)
Strong systems engineering mindset and understanding of distributed systems
Comfortable working with Agentic AI tools (LLM-based assistants/agents)
Proven ability to troubleshoot complex production incidents and drive effective resolutions
Experience building, scaling, and maintaining automated infrastructure and workflows
Comfortable working cross-functionally and guiding teams toward reliable engineering practices
Must-have technical skills: Kubernetes in production environments, Infrastructure as Code (Terraform, Helm, Kustomize), Modern CI/CD tooling and cloud-native ecosystems, Proficiency in at least one programming language (JavaScript/TypeScript, Go, or Python)