CloudAIMLLLMMLOpsPerformance OptimizationSaaSCI/CDLeadershipBudgetingCommunicationDecision Making
About this role
Role Overview
Lead and scale the Infrastructure / Core Platform domain, managing multiple teams responsible for cloud infrastructure, internal platforms, reliability, and core services.
Enable AI-first product delivery at scale by providing robust foundations for LLM workloads, agentic systems, data pipelines, model evaluation, and secure model serving.
Set a clear technical and operational vision for infrastructure that balances reliability, velocity, security, and cost.
Partner closely with Product, Security, and Engineering Leadership to align platform investments with business and product strategy.
Establish and evolve platform standards for CI/CD, observability, infrastructure-as-code, service ownership, and developer experience.
Lead teams responsible for cloud cost management, performance optimization, and scalability, especially as AI workloads grow.
Coach and develop senior+ Infrastructure engineers, setting clear expectations and growth paths, while raising the overall talent bar through strong hiring, onboarding, and succession planning.
Guide teams through change and ambiguity, including fast-evolving AI infrastructure needs and shifting organizational priorities.
Contribute to org-wide engineering strategy, planning, and execution as a senior member of the engineering leadership team.
Own strategic relationships with key Infrastructure and SaaS vendors, including contract negotiations and amendments, usage and spend oversight, roadmap alignment, and evaluation of new capabilities as platform needs evolve.
Partner closely with Finance and Procurement to lead cloud cost management and optimization, including cost allocation, forecasting, budgeting, and governance models that scale sustainably with growing AI workloads.
Requirements
5+ years of engineering management experience, including leading managers and/or multiple teams.
Strong prior experience as a software, platform, or infrastructure engineer, with the ability to engage deeply on technical strategy and trade-offs.
Proven experience owning production infrastructure at scale, including cloud platforms, reliability, security, and developer enablement.
Bonus: Interest in or exposure to AI/ML systems (such as LLM workloads, data pipelines, model serving, MLOps, or evaluation frameworks), including learning or personal development in this area — hands-on model development is not required.
A track record of data-driven decision making across reliability, cost, capacity, and delivery.
Ability to apply first-principles thinking to complex systems and organizational problems.
Strong change agility — comfortable leading teams through evolving architecture, tooling, and AI-driven shifts.
Excellent communication skills, with the ability to influence across Product, Security, and Engineering.
Based in the East Coast U.S., with the ability to work effectively across Boston, Washington DC, or Austin hubs.
Tech Stack
Cloud
Benefits
Health (medical, vision, dental), life, and disability insurance*
Equity stock options
Retirement plans
Paid public holidays and unlimited PTO
Paid maternity and parental leave
Leaves of absence (including caregiver leave and leave under CO's Healthy Families and Workplaces Act)