Nebius is leading a new era in cloud computing to serve the global AI economy. They are seeking a Senior Hardware Engineer to support their expanding North American operations, focusing on the design, deployment, and maintenance of high-performance cloud systems optimized for AI workloads.
Responsibilities:
- Participate in the design, deployment, and maintenance of high-performance cloud systems optimized for AI workloads
- Arrange and perform hardware R&D tests and experiments on-site in data center environments
- Troubleshoot and resolve complex system issues related to GPUs, networking (InfiniBand, NVLink), PCIe, and server infrastructure
- Conduct deep investigations into hardware, software, and networking issues to ensure optimal system performance and reliability
- Develop and execute test plans and methodologies for advanced GPU, InfiniBand, and compute systems to benchmark and validate performance
- Collaborate closely with cross-functional engineering and operations teams to improve system performance and reliability
- Monitor system performance and continuously fine-tune configurations for maximum efficiency
Requirements:
- Strong knowledge of modern server architecture, particularly in high-performance, GPU-based environments
- Hands-on experience with GPUs, networking, NVLink, and PCIe technologies
- Proficiency in Linux systems, with experience using Python and Bash for automation and tooling
- Demonstrated ability to troubleshoot complex hardware, software, and networking issues
- Experience with deep problem investigation, root cause analysis, and performance optimization in cloud or high-performance computing environments
- Strong analytical and problem-solving skills with a performance-first mindset
- Basic electronics modification skills, including soldering and wiring
- Knowledge of the Linux kernel and experience with kernel-level debugging or troubleshooting
- Familiarity with electronic measurement equipment such as oscilloscopes and multimeters