Define and execute the platform engineering strategy across cloud infrastructure, ensuring alignment with business objectives and compliance requirements.
Oversee cloud infrastructure operations across a multi-cloud environment (Azure, AWS, GCP).
Lead infrastructure evolution initiatives to improve availability, reduce costs, and enhance platform scalability.
Drive infrastructure cost optimization through continuous analysis, rightsizing, and efficient resource utilization.
Champion modern engineering practices including GitOps, Infrastructure as Code (Terraform), and Policy as Code across the organization.
Lead DevOps initiatives encompassing CI/CD pipelines, developer experience, and engineering tooling.
Implement and maintain observability frameworks to ensure comprehensive monitoring, logging, and alerting.
Establish and lead incident management processes, serving as a key coordination point between product development, security, and operations teams.
Drive resilience testing and chaos engineering practices to proactively identify and address system vulnerabilities.
Define and track SLIs, SLOs, and error budgets to maintain platform reliability and inform engineering priorities.
Lead post-incident reviews and ensure continuous improvement through actionable follow-ups.
Implement rigorous change management processes to minimize risk and ensure stable production environments.
Partner with product development teams to plan and coordinate rollouts, balancing velocity with reliability.
Establish and maintain release management standards and deployment strategies (blue-green, canary, feature flags).
Lead and mentor a globally distributed team across DevOps, SRE, Database Operations, and Incident Management functions.
Manage relationships with cloud providers and service vendors, including coordination of incident reviews and service quality meetings.
Build a culture of operational excellence, continuous learning, and knowledge sharing across time zones.
Ensure infrastructure operations maintain compliance with SOC2 and ISO 27001 standards.
Support FedRAMP readiness initiatives in collaboration with security and compliance teams.
Partner with the security organization to implement infrastructure security controls and audit requirements.
Requirements
15+ years of experience in infrastructure, DevOps, or platform engineering roles, with at least 5 years in a leadership position within a SaaS or high-growth technology environment.
Proven experience operating and optimizing cloud infrastructure at scale, with hands-on expertise managing cloud spend exceeding $1M annually.
Strong technical expertise in multi-cloud environments (Azure, AWS, GCP) and container orchestration platforms (Kubernetes).
Deep experience with Infrastructure as Code, GitOps workflows, and Policy as Code frameworks.
Demonstrated success leading infrastructure migrations, platform modernization, and large-scale technical projects.
Proven track record driving incident management programs, including on-call processes, escalation procedures, and post-incident reviews.
Strong experience implementing and managing change management processes in production SaaS environments.
Excellent communication skills with demonstrated ability to lead globally distributed teams across multiple time zones.
Working knowledge of compliance frameworks (SOC2, ISO 27001) and their infrastructure implications.
Tech Stack
AWS
Azure
Cloud
Google Cloud Platform
Kubernetes
Terraform
Benefits
Competitive regional compensation and equity package