VP of Cloud Engineering, Operations & Delivery

United States of America

Full Time

6 days ago

$200,000 - $225,000 USD

H1B Sponsor Likely

Key skills

AWSAzureGoogle Cloud PlatformInfrastructure as CodeTerraformGitHubJenkinsQualysSonarqubeKubernetesEKSAKSGKECI/CD automationEvent-driven architectureIAM designNetwork segmentationSecrets managementSOC 2NISTCISHIPAAPCI-DSSAI agentsMulti-agent systemsLangGraphAutoGenAmazon Bedrock AgentsAzure AI Agent ServiceVertex AI Agent BuilderSRE practicesSLOsError budgetsIncident managementBlameless postmortemsFinOpsDatadogCloudWatchPolicy-as-codeOPASentinelCheckovAIMLLLMRAGAgenticGCPServerlessIAMBedrockVertex AIGitOpsCI/CDLeadershipCommunicationSalesSonarQubeCloud Security

About this role

EXL is seeking an experienced VP of Cloud Engineering, Operations & Delivery to lead their cloud practice across various industry verticals. The role involves providing technical leadership, driving AI strategy, and ensuring operational reliability while managing high-performing teams to deliver complex multi-cloud solutions.

Responsibilities:

Serve as the senior technical authority for cloud architecture and infrastructure decisions across AWS, Azure, and GCP
Advance and mature our Infrastructure as Code (IaC) practices — Github, Jenkins, Terraform, Qualys, Sonarqube, etc. — ensuring consistency, security, and scalability across client environments
Provide meaningful technical guidance and architectural direction to engineering teams — going beyond high-level oversight to engage substantively on design decisions, standards, and delivery quality
Guide adoption of cloud-native patterns including Kubernetes (EKS/AKS/GKE), serverless, CI/CD automation, and event-driven architecture
Lead architecture reviews and serve as the escalation point for complex technical challenges
Ensure security and compliance are embedded into infrastructure from the ground up — spanning IAM design, network segmentation, secrets management, and frameworks such as SOC 2, NIST, CIS, HIPAA, and PCI-DSS
Champion the adoption of AI agents and multi-agent systems to transform how cloud infrastructure is built, operated, and optimized — moving teams from reactive, manual workflows to intelligent, autonomous execution
Identify high-value opportunities to introduce agentic workflows into engineering operations — including infrastructure provisioning, incident detection and remediation, cost optimization, compliance monitoring, security response, and deployment pipelines
Lead the evaluation and adoption of agentic AI frameworks and platforms (e.g., LangGraph, AutoGen, Amazon Bedrock Agents, Azure AI Agent Service, Vertex AI Agent Builder) to build purpose-built agents that extend the capabilities of our engineering teams
Define governance, guardrails, and human-in-the-loop checkpoints for agentic systems operating in cloud environments — ensuring autonomous actions are safe, auditable, and aligned with client expectations
Collaborate with engineering and solutions teams to design agentic delivery pipelines — where AI agents assist in code generation, IaC validation, drift detection, security scanning, and release orchestration
Work with peer technology teams to identify process transformation opportunities — helping envision, roadmap and execute an agentic future state for cloud operations and engineering workflows
Stay ahead of the rapidly evolving AI agent ecosystem and bring informed, practical perspectives on what is production-ready versus experimental
Own the operational health of cloud environments across the client portfolio — including availability, performance, security posture, and cost efficiency
Mature SRE practices across the organization: SLOs, error budgets, incident management, and blameless postmortems
Drive FinOps discipline — optimizing cloud spend through right-sizing, commitment strategies, tagging governance, and anomaly detection — increasingly augmented by AI-driven insights and autonomous recommendations
Define and enforce observability standards across logging, metrics, and tracing using Datadog and CloudWatch — and explore how agentic monitoring can move teams from alert fatigue to autonomous resolution
Lead end-to-end delivery of cloud engineering engagements — from technical discovery and architecture through deployment, cutover, and steady-state operations
Build scalable delivery frameworks, runbooks, and IaC-driven playbooks that can be applied consistently across verticals and client environments — and actively work to make those playbooks AI-executable over time
Proactively identify technical risks and drive resolution before they become client issues
Build, mentor, and retain a high-performing team of cloud engineers, DevOps engineers, SREs, and delivery managers — cultivating a team culture that embraces AI-augmented workflows as a force multiplier, not a threat
Define clear career ladders, engineering standards, and technical growth paths that attract and retain top talent — including emerging skills in AI/ML infrastructure, prompt engineering, and agentic system design
Foster a culture of engineering excellence, continuous learning, and genuine curiosity about what AI agents can unlock
Communicate cloud strategy, delivery status, and technical decisions clearly to executive stakeholders — both internally and with clients
Help clients articulate and develop their agentic transformation roadmap — translating the potential of AI agents into concrete, phased business outcomes
Participate in pre-sales and client-facing conversations with enough technical depth to build confidence and credibility
Translate cloud provider roadmaps — including rapidly evolving AI and agent capabilities from AWS, Azure, and GCP — into strategic investments and differentiated service offerings
Represent the engineering organization in leadership discussions, helping align technical capabilities with business growth objectives

Requirements:

12+ years of experience in cloud infrastructure, platform engineering, or DevOps — with at least 4 years in a senior leadership capacity
Strong working knowledge of AWS, Azure, and GCP — you understand how these platforms work in practice, not just in principle; professional-level certifications are a plus
Solid, proven experience with Infrastructure as Code — particularly Terraform — including best practices around module design, state management, GitOps workflows, and policy enforcement
Demonstrated experience leading cloud delivery programs for enterprise clients across multiple industries
Practical exposure to AI agents and agentic frameworks — you've either built, deployed, or operated AI agent systems in a production or near-production context and understand how to design reliable, governed agentic workflows
A creative, process-transformation mindset — you look at how work gets done today and can credibly envision how intelligent agents could do it better, faster, and more reliably tomorrow
Working knowledge of Kubernetes in production environments and modern CI/CD practices
Familiarity with cloud security frameworks and compliance requirements relevant to multi-vertical client environments
A track record of building and developing high-performing engineering teams
Exceptional communication skills — able to engage engineers at a technical level and translate that into clear, confident messaging for executives and clients alike
Hands-on experience with agentic AI platforms such as LangGraph, AutoGen, Amazon Bedrock Agents, Azure AI Agent Service, or Vertex AI Agent Builder
Experience designing multi-agent architectures — including agent orchestration, tool use, memory management, and human-in-the-loop design patterns
Familiarity with LLM integration patterns in cloud-native applications — RAG pipelines, vector databases, embedding workflows, and model hosting on cloud infrastructure
Experience in a managed services or solutions provider environment serving diverse industry verticals
Background in platform engineering or Internal Developer Platform (IDP) development
Familiarity with policy-as-code tools such as OPA, Sentinel, or Checkov
AWS, Azure, and/or GCP professional-level certifications, including any AI/ML specialty certifications