Nebius is leading a new era in cloud computing to serve the global AI economy. The Senior Data Center Deployment Engineer will own the end-to-end delivery, deployment, and production readiness of next-generation GPU platforms inside data centers, collaborating closely with hardware and infrastructure teams.
Responsibilities:
- Lead end-to-end deployment of GB-series racks within data center environments
- Oversee installation, bring-up, validation, and production readiness of NVIDIA H200 and B200-based servers
- Troubleshoot complex hardware, firmware, Linux OS, and networking issues
- Execute structured testing and validation procedures during deployment
- Develop and maintain basic Linux-based hardware health-check and diagnostic scripts
- Coordinate on-site hardware repairs, part replacements, and vendor escalations
- Drive root cause analysis and ensure corrective actions are implemented
- Manage and prioritize deployment timelines across multiple concurrent rollouts
- Provide technical leadership and guidance to on-site engineers and technicians
- Partner with networking and infrastructure teams to ensure seamless integration
- Document deployment processes, validation standards, and operational runbooks
Requirements:
- Strong hands-on experience deploying and operating data center infrastructure
- Deep familiarity with GPU-dense systems, ideally NVIDIA H-series platforms
- Experience working with high-density rack deployments (GB-series or similar)
- Solid Linux experience, including troubleshooting and scripting
- Ability to diagnose issues across hardware, OS, firmware, and network layers
- Experience coordinating field repairs and working directly with hardware vendors
- Proven experience leading technical teams or overseeing field operations
- High ownership mindset and ability to operate in production-critical environments
- Clear communication skills and ability to collaborate across distributed teams
- Experience deploying AI or HPC clusters at scale
- Familiarity with automated provisioning or infrastructure lifecycle systems
- Background in hardware qualification, burn-in testing, or factory validation
- Experience supporting rapid infrastructure expansion
- Exposure to ARM-based or heterogeneous compute environments