Architect and manage the global cloud infrastructure for our Zero Trust Networking Services
Own the delivery pipeline and operational health of a massive-scale, multi-region distributed system
Design and implement a multi-region AWS architecture and lead the development of modular Terraform libraries to automate provisioning across diverse geographies
Architect self-healing infrastructure using advanced cloud load balancing, auto-scaling patterns, and Multi-AZ database topologies to ensure high availability
Modernize CI/CD pipelines and implement Blue/Green and Canary deployment strategies to ensure zero-downtime upgrades for a continuously running global network service
Build comprehensive SRE dashboards and implement intelligent alerting frameworks to detect regional outages or capacity exhaustion before they impact customers
Monitor cloud resource utilization and implement scaling policies that perfectly balance performance requirements with cost-efficiency
Requirements
12+ years of overall experience in Software Engineering, DevOps, or Site Reliability Engineering combined with a BS/MS in Computer Science or relevant field
Deep mastery of AWS services and architecting AWS-managed SQL and NoSQL data stores, with a focus on designing scalable local and multi-region deployment strategies
Advanced expertise in Infrastructure as Code using Terraform and expert proficiency in Python or Go for building automation tooling
Strong knowledge of Linux/BSD internals, observability stacks (Prometheus, InfluxDB), and security compliance (PKI, IAM, DevSecOps)
U.S. citizenship due to the nature of the customers assigned to this role