Anthropic is a public benefit corporation focused on creating reliable and beneficial AI systems. The Cluster Deployment Engineer will be responsible for overseeing the deployment strategy of large-scale AI compute clusters, ensuring they are organized and operationally effective within the datacenter fleet.
Responsibilities:
- Own cluster-level deployment strategy — define how AI compute clusters are organized across the floor, how racks interconnect, and how cluster topology requirements translate into facility and deployment scope across a portfolio of sites
- Set rack interface standards spanning power, network, mechanical, thermal, and spatial domains, and ensure that every deployment includes the complete set of infrastructure required to bring a cluster online
- Drive multi-threaded cluster bring-up programs across hardware, networking, power, and cooling — owning plans, dependencies, and critical paths from hardware specification through energization and turn-up
- Partner with internal engineering teams — research, systems, networking, and hardware — to translate cluster requirements into deployable facility scope, and to derisk onboarding of new hardware platforms well ahead of delivery
- Lead external partner execution with developers, engineering firms, OEMs, and construction teams, driving technical reviews, deviation management, and handoffs that keep deployments on schedule and within specification
- Improve cluster turn-up reliability and repeatability — identify systemic gaps in deployment scope, tooling, and partner interfaces, and drive durable fixes that reduce time-to-serve for new capacity
- Define and track deployment KPIs — cluster readiness, schedule adherence, scope completeness, time-to-first-packet — and use historical trends to forecast risk and inform capacity planning
- Coordinate cross-functional readiness across supply chain, security, operations, and construction to ship production-ready compute capacity
- Provide crisp executive visibility on deployment progress, tradeoffs, and risks across a portfolio of concurrent cluster programs
- Design cluster interfaces for durability — define rack and cluster-level interfaces that remain robust across hardware generations, so that facility scope and deployment models do not need to be reinvented every time the underlying hardware changes
- Build cluster layout and BOM tooling — create and maintain the tools, templates, and data models that turn cluster topology and rack specifications into accurate floor layouts, deployment sequences, and complete bills of materials, replacing one-off spreadsheets with repeatable, auditable workflows
Requirements:
- 10+ years of experience in hyperscale datacenter environments, with senior-level responsibility for cluster deployment, large-scale IT integration, or equivalent infrastructure programs
- Delivered AI, HPC, or high-density compute clusters at scale and developed a strong intuition for the constraints that govern cluster deployment — interconnect reach, adjacency, power density, and thermal limits
- Operate fluently across the boundary between IT hardware and facility infrastructure, and have set interface standards that held up across multiple hardware generations and sites
- Led cross-functional programs with both internal engineering teams and external developers, engineering firms, and OEM partners, and are effective at driving alignment across organizational levels
- Combine strong systems thinking with execution discipline — comfortable zooming from cluster topology and portfolio strategy down to the specific interface detail that will otherwise become a field issue
- Communicate clearly with technical and executive audiences, and can distill complex, multi-disciplinary programs into decisions and tradeoffs leadership can act on
- Thrive in ambiguous, fast-moving environments where the hardware, the scale, and the requirements are all changing simultaneously
- Hold a Bachelor's degree in Electrical Engineering, Mechanical Engineering, Computer Engineering, or equivalent practical experience
- Direct experience deploying leading-edge AI accelerator clusters at hyperscale
- Shaped reference designs, deployment standards, or cluster-level playbooks that were adopted across a fleet
- Experience working across multiple geographies and understand how regional codes, climate, utility constraints, and supply chains shape cluster-level decisions
- Partnered closely with hardware and system providers on long-term platform onboarding and bring-up
- Experience building the program mechanisms — roadmaps, milestones, risk registers, runbooks — that make delivery predictable at massive scale