NVIDIA is a leading technology company known for pioneering visual computing and AI. They are seeking a Senior Systems Software Engineer to own full-stack OS enablement for the DGX Station, focusing on Windows and Linux integration to ensure seamless operation of AI applications.
Responsibilities:
- Own end-to-end Windows enablement for DGX Station—driving the platform from initial bring-up on Windows through WHQL certification to customer-ready shipping quality
- Drive Linux bring-up and continuous enablement for DGX Station on DGX OS / Ubuntu, including kernel module integration, device tree and ACPI configuration, systemd services, initramfs, and dkms packaging
- Enable and validate BIOS/UEFI, BMC, and system-level firmware for Windows and Linux on the Grace (Arm) + Blackwell GB300 architecture
- Coordinate GPU driver, display driver, and compute driver bring-up and validation on Windows (WDDM, MCDM) and Linux (open-gpu-kernel-modules, DRM/KMS)
- Ensure the CUDA toolkit, cuDNN, TensorRT, NCCL, and NVIDIA’s AI SDK stack are fully functional on DGX Station on both Windows and Linux
- Validate that NVIDIA AI applications—NIM microservices, NemoClaw, AI Workbench, and developer tools—run correctly on DGX Station across Windows and Linux
- Drive the overall test strategy for DGX Station on Windows and Linux: functional testing, stress testing, power/thermal validation, sleep/resume and S-state cycles, Windows Update and Linux kernel-upgrade compatibility, and long-duration reliability
- Be the primary technical interface with Microsoft (Windows on Arm, WHQL, driver signing) and ODM/OEM partners shipping DGX Station
- Profile and optimize system performance—boot time, GPU compute throughput, NVLink-C2C and memory bandwidth utilization, power efficiency, and thermal behavior
- Create and maintain platform documentation for DGX Station on Windows and Linux: bring-up guides, known issues, driver compatibility matrices, recovery and re-imaging procedures, and developer setup instructions
Requirements:
- BS or MS in Computer Science, Electrical Engineering, or related field (or equivalent experience) and 12+ yrs of confirmed experience in systems software engineering with deep expertise in Windows platform enablement, driver development, or OS integration, and proven hands-on experience bringing up Linux on new hardware platforms
- Strong hands-on experience with Windows internals: kernel-mode drivers, ACPI, power management, Secure Boot, UEFI, WDM/WDF driver frameworks, and the WHQL certification process
- Solid understanding of Linux platform enablement: kernel modules, device tree / ACPI on Arm, systemd, initramfs, dkms, and packaging for Ubuntu / DGX OS
- Experience with GPU driver stack, display drivers, or compute drivers on Windows and/or Linux. Familiarity with DirectX, WDDM, DRM/KMS, and GPU compute APIs is a strong plus
- Experience enabling hardware platforms—bring-up, driver integration, validation, and certification for shipping products on Windows and Linux
- Strong debugging and root-cause analysis skills across firmware, driver, and OS boundaries. Comfortable with WinDbg, kernel debugging (kd, kgdb/crash), crash dump analysis, ftrace/ETW, and performance profiling tools
- Ability to work across organizational boundaries—coordinating with GPU driver, CUDA, firmware, BMC, and AI software teams as well as external partners (Microsoft, ODM/OEMs)
- Proficiency in C/C++ and Python. Experience with Arm architecture is a plus
- Experience with Windows on Arm platforms—driver enablement, performance optimization, or application compatibility on Arm-based Windows devices
- Hands-on experience with CUDA, TensorRT, or AI/ML frameworks on Windows and Linux—especially on Arm + NVIDIA GPU systems
- Prior experience working with OEM/ODM partners or silicon vendors on Windows and Linux platform certification for workstation- or server-class hardware
- Track record shipping workstation or server hardware products—from bring-up through general availability—with both Windows and Linux support
- Experience with BMC, Redfish, out-of-band management, or platform manageability software on high-end workstations or servers
- Experience with GPU-accelerated applications: AI training and inference, content creation tools, or scientific computing on Windows and Linux