YPO is the world's most influential community of chief executives, connecting more than 35,000 extraordinary business leaders. They are seeking a Senior DevOps Engineer to design, build, and operate the cloud infrastructure and developer platform that will power its next generation of products, focusing on a rapidly scaling AI-first mobile platform.
Responsibilities:
- Own the architecture and day-to-day operation of YPO’s cloud infrastructure across its full lifecycle
- Architect, implement, and continuously evolve infrastructure across AWS, Azure, and/or GCP — ensuring scalability, resilience, and cost-efficiency
- Design and manage multi-region, highly available environments for a global AI-first platform
- Own cloud cost management and FinOps practices — including tagging strategies, reserved capacity planning, and anomaly detection
- Manage DNS, CDN, load balancing, and networking configurations to ensure global performance and failover
- Codify everything — if it cannot be automated, it should be questioned
- Lead Infrastructure as Code using Terraform — ensuring all infrastructure is version-controlled, tested, and deployed through automation
- Define IaC standards, module structures, and governance practices across engineering
- Automate environment provisioning, teardown, and configuration management across environments
- Build automation pipelines for operational tasks (certificate rotation, secrets rotation, compliance remediation, drift detection)
- Write clean, well-tested automation scripts (Python, Bash)
- Design and maintain CI/CD pipelines across mobile (iOS/Android), backend APIs, AI platform, and data workloads
- Implement branch strategies, environment promotion workflows, and feature flagging patterns
- Integrate automated quality gates (testing, security scanning, container scanning, IaC linting)
- Lead adoption of progressive delivery techniques (blue-green, canary, traffic shifting)
- Partner with Security to embed secure-by-default practices across all releases
- Own release documentation, change management workflows, and deployment runbooks
- Design and operate Kubernetes environments (EKS, AKS, GKE)
- Manage container image governance, scanning, and lifecycle policies
- Implement service mesh, ingress controllers, and network policies
- Improve developer self-service through internal developer platforms and tooling
- Support service evolution as part of YPO’s digital transformation roadmap
- Design a comprehensive observability stack (metrics, logs, traces, synthetic monitoring)
- Define and enforce SLIs, SLOs, and error budgets
- Build dashboards, alerting, and on-call runbooks to improve MTTD and MTTR
- Lead blameless post-mortems and continuous improvement efforts
- Own capacity planning and performance benchmarking
- Embed security controls and policy-as-code across infrastructure and pipelines
- Implement and manage secrets management solutions
- Enforce cloud security baselines and auto-remediation
- Support compliance programmes (SOC2, ISO 27001) through automation and auditability
- Manage network security and zero-trust architecture patterns
- Own internal developer experience — reducing friction and enabling self-service
- Define standards, documentation, and operational best practices
- Mentor engineers and build DevOps literacy across the organisation
- Act as a cross-functional bridge across Product, Engineering, AI/Data, and Security
- Contribute to technology decisions with clear trade-off analysis
Requirements:
- 5+ years in DevOps, platform engineering, or SRE (2+ at senior/lead level)
- Strong cloud expertise (AWS preferred; Azure/GCP exposure)
- Terraform / Infrastructure as Code proficiency (required)
- CI/CD pipeline design across multiple workload types
- Strong Kubernetes experience in production environments
- Python and/or scripting for automation
- Networking fundamentals (DNS, TCP/IP, TLS, load balancing, CDN)
- Observability tooling (Datadog, Prometheus, Grafana, etc.)
- Experience with IAM, secrets management, and security practices
- Strong communication and cross-functional collaboration
- Mobile release pipelines (iOS/Android)
- AI/ML infrastructure or data pipeline experience
- Platform engineering tools (Backstage, internal developer portals)
- FinOps / cost optimisation tooling
- Multi-region architecture and global systems experience
- Experience in global SaaS or membership platforms