Principal System Software Engineer – Data Center MODS
Santa Clara, California, United States of America
Full Time
1 hour ago
$272,000 - $431,250 USD
H1B Sponsor
Key skills
Distributed SystemsLinuxPythonC++CAILeadership
About this role
Role Overview
Define technical strategy and development of NVIDIA’s Data Center diagnostic systems, orchestrating large-scale stress testing for CPUs, GPUs, networking, memory, and high-speed interconnects.
Mentor and grow engineering teams, providing technical leadership and encouraging a culture of innovation and excellence.
Drive the root-cause analysis of systemic failures that intersect multiple hardware and software domains.
Partner with CSPs to diagnose and address scalability challenges within their unique data center infrastructures.
Requirements
Bachelor's degree in Computer Science/Engineering, Electrical Engineering, or a related field (or equivalent experience).
15+ years of system software experience working on highly resilient distributed systems with programming experience in C++ or Python.
Deep systems knowledge of x86/ARM architectures, Linux OS internals, firmware (UEFI/BIOS), Redfish, HMC, BMC protocols and platform security.
Consistent track record demonstrating technical leadership leading project teams and setting technical direction.
Expertise in software testing methodologies with an automation-led, AI-first approach to ensuring software quality.