Anthropic is a public benefit corporation focused on creating reliable and interpretable AI systems. They are looking for a Data Center Engineer specializing in Resource Efficiency to optimize power and cooling in their AI infrastructure, ensuring efficient compute allocation across their TPU/GPU fleet.
Responsibilities:
- Build models that forecast consumption across electrical and mechanical subsystems, informing capacity planning, energy procurement, oversubscription targets and risks, including statistical modeling of cluster utilization, workload profiles, and failure modes
- Design IT/OT interfaces that bridge compute orchestration with facility controls, enabling real-time telemetry across accelerator hardware, power distribution, cooling, and schedulers
- Build and operate load management systems that use power and cooling topology to enable load management and power/thermal-aware placement to maximize throughput while meeting SLOs
- Partner with data center providers to drive design optimizations and hold them accountable to SLA-grade performance standards, providing technical diligence on partner architectures
Requirements:
- Bachelor's degree in Electrical Engineering, Mechanical Engineering, Power Systems, Controls Engineering, or a related field
- 5+ years of experience in data center infrastructure or facility engineering
- Demonstrated experience with data center power distribution and cooling system architectures
- Experience building or operating software-based power management, load scheduling, or control systems
- Proficiency in Python or similar languages for statistical modeling, simulation, or automation of data center infrastructure optimizations
- Familiarity with SCADA, BMS, EPMS, or industrial control systems and associated protocols (Modbus, BACnet, SNMP)
- Track record of cross-functional collaboration across hardware, software, and facilities teams
- Master's or PhD in Controls, Power Systems, or related discipline and 3+ years of experience in data center infrastructure or facility engineering
- Experience with accelerator-class deployments and their power management interfaces
- Background in control theory, dynamical systems, or cyber-physical systems design
- Experience with energy storage, microgrid integration, demand response, or behind-the-meter generation
- Familiarity with reliability engineering methods
- Experience with SLA development, availability modeling, or service credit frameworks
- Exposure to ML/optimization techniques applied to infrastructure or energy systems