Switch is a company that designs, builds, and operates data centers, aiming to create the world’s most advanced digital infrastructure. The Senior Principal Reliability Engineer will serve as the senior technical authority for mechanical and electrical reliability, defining fleet-wide reliability strategies and leading maintenance programs to prevent unplanned downtime.
Responsibilities:
- Define and own fleet-wide reliability strategies for critical power and cooling infrastructure
- Establish engineering standards, redundancy models, and reliability frameworks across all campuses
- Architect and mature predictive and condition-based maintenance programs
- Implement and optimize RCM, FMEA, PM optimization, and asset criticality methodologies
- Develop remaining useful life models and failure forecasting approaches to reduce unplanned outages
- Define telemetry and sensor standards across BMS, EPMS, SCADA, and DCIM platforms
- Partner with controls and analytics teams to develop high-fidelity data models for reliability monitoring
- Lead failure investigations and convert root cause findings into engineering, maintenance, or operational improvements
- Drive systemic risk reduction initiatives across the global data center fleet
- Own asset lifecycle reliability from commissioning through end-of-life modeling
- Maintain global maintenance standards, templates, and procedural governance
- Mentor engineers and operations leaders on reliability methodology and analytical techniques
- Influence OEMs, design engineering, and construction teams to embed reliability into future deployments
Requirements:
- Bachelor's degree in Mechanical or Electrical Engineering required
- 12 or more years of experience in mission-critical or hyperscale data center environments
- Deep expertise in critical electrical systems such as UPS, medium and low voltage gear, and switchgear
- Deep expertise in mechanical systems such as CRAH or CRAC units, chillers, cooling towers, and pumping systems
- Proven experience in RCM, FMEA, RCA, predictive and condition-based maintenance, and reliability analytics
- Experience with controls systems including BMS, EPMS, SCADA, and telemetry-enabled monitoring
- Experience applying statistical reliability modeling such as Weibull analysis or Monte Carlo simulation
- Preferred experience with high-density cooling systems and advanced analytics or AI-enabled maintenance strategies