EXL is seeking an experienced VP of Cloud Engineering, Operations & Delivery to lead their cloud practice across various industry verticals. The role involves providing technical leadership, driving AI strategy, and ensuring operational reliability while managing high-performing teams to deliver complex multi-cloud solutions.
Responsibilities:
- Serve as the senior technical authority for cloud architecture and infrastructure decisions across AWS, Azure, and GCP
- Advance and mature our Infrastructure as Code (IaC) practices — Github, Jenkins, Terraform, Qualys, Sonarqube, etc. — ensuring consistency, security, and scalability across client environments
- Provide meaningful technical guidance and architectural direction to engineering teams — going beyond high-level oversight to engage substantively on design decisions, standards, and delivery quality
- Guide adoption of cloud-native patterns including Kubernetes (EKS/AKS/GKE), serverless, CI/CD automation, and event-driven architecture
- Lead architecture reviews and serve as the escalation point for complex technical challenges
- Ensure security and compliance are embedded into infrastructure from the ground up — spanning IAM design, network segmentation, secrets management, and frameworks such as SOC 2, NIST, CIS, HIPAA, and PCI-DSS
- Champion the adoption of AI agents and multi-agent systems to transform how cloud infrastructure is built, operated, and optimized — moving teams from reactive, manual workflows to intelligent, autonomous execution
- Identify high-value opportunities to introduce agentic workflows into engineering operations — including infrastructure provisioning, incident detection and remediation, cost optimization, compliance monitoring, security response, and deployment pipelines
- Lead the evaluation and adoption of agentic AI frameworks and platforms (e.g., LangGraph, AutoGen, Amazon Bedrock Agents, Azure AI Agent Service, Vertex AI Agent Builder) to build purpose-built agents that extend the capabilities of our engineering teams
- Define governance, guardrails, and human-in-the-loop checkpoints for agentic systems operating in cloud environments — ensuring autonomous actions are safe, auditable, and aligned with client expectations
- Collaborate with engineering and solutions teams to design agentic delivery pipelines — where AI agents assist in code generation, IaC validation, drift detection, security scanning, and release orchestration
- Work with peer technology teams to identify process transformation opportunities — helping envision, roadmap and execute an agentic future state for cloud operations and engineering workflows
- Stay ahead of the rapidly evolving AI agent ecosystem and bring informed, practical perspectives on what is production-ready versus experimental
- Own the operational health of cloud environments across the client portfolio — including availability, performance, security posture, and cost efficiency
- Mature SRE practices across the organization: SLOs, error budgets, incident management, and blameless postmortems
- Drive FinOps discipline — optimizing cloud spend through right-sizing, commitment strategies, tagging governance, and anomaly detection — increasingly augmented by AI-driven insights and autonomous recommendations
- Define and enforce observability standards across logging, metrics, and tracing using Datadog and CloudWatch — and explore how agentic monitoring can move teams from alert fatigue to autonomous resolution
- Lead end-to-end delivery of cloud engineering engagements — from technical discovery and architecture through deployment, cutover, and steady-state operations
- Build scalable delivery frameworks, runbooks, and IaC-driven playbooks that can be applied consistently across verticals and client environments — and actively work to make those playbooks AI-executable over time
- Proactively identify technical risks and drive resolution before they become client issues
- Build, mentor, and retain a high-performing team of cloud engineers, DevOps engineers, SREs, and delivery managers — cultivating a team culture that embraces AI-augmented workflows as a force multiplier, not a threat
- Define clear career ladders, engineering standards, and technical growth paths that attract and retain top talent — including emerging skills in AI/ML infrastructure, prompt engineering, and agentic system design
- Foster a culture of engineering excellence, continuous learning, and genuine curiosity about what AI agents can unlock
- Communicate cloud strategy, delivery status, and technical decisions clearly to executive stakeholders — both internally and with clients
- Help clients articulate and develop their agentic transformation roadmap — translating the potential of AI agents into concrete, phased business outcomes
- Participate in pre-sales and client-facing conversations with enough technical depth to build confidence and credibility
- Translate cloud provider roadmaps — including rapidly evolving AI and agent capabilities from AWS, Azure, and GCP — into strategic investments and differentiated service offerings
- Represent the engineering organization in leadership discussions, helping align technical capabilities with business growth objectives
Requirements:
- 12+ years of experience in cloud infrastructure, platform engineering, or DevOps — with at least 4 years in a senior leadership capacity
- Strong working knowledge of AWS, Azure, and GCP — you understand how these platforms work in practice, not just in principle; professional-level certifications are a plus
- Solid, proven experience with Infrastructure as Code — particularly Terraform — including best practices around module design, state management, GitOps workflows, and policy enforcement
- Demonstrated experience leading cloud delivery programs for enterprise clients across multiple industries
- Practical exposure to AI agents and agentic frameworks — you've either built, deployed, or operated AI agent systems in a production or near-production context and understand how to design reliable, governed agentic workflows
- A creative, process-transformation mindset — you look at how work gets done today and can credibly envision how intelligent agents could do it better, faster, and more reliably tomorrow
- Working knowledge of Kubernetes in production environments and modern CI/CD practices
- Familiarity with cloud security frameworks and compliance requirements relevant to multi-vertical client environments
- A track record of building and developing high-performing engineering teams
- Exceptional communication skills — able to engage engineers at a technical level and translate that into clear, confident messaging for executives and clients alike
- Hands-on experience with agentic AI platforms such as LangGraph, AutoGen, Amazon Bedrock Agents, Azure AI Agent Service, or Vertex AI Agent Builder
- Experience designing multi-agent architectures — including agent orchestration, tool use, memory management, and human-in-the-loop design patterns
- Familiarity with LLM integration patterns in cloud-native applications — RAG pipelines, vector databases, embedding workflows, and model hosting on cloud infrastructure
- Experience in a managed services or solutions provider environment serving diverse industry verticals
- Background in platform engineering or Internal Developer Platform (IDP) development
- Familiarity with policy-as-code tools such as OPA, Sentinel, or Checkov
- AWS, Azure, and/or GCP professional-level certifications, including any AI/ML specialty certifications