NVIDIA is known as 'the AI computing company' and is seeking a Senior Software Engineer - Server Manageability to innovate in managing GPU-based AI servers. The role involves designing and delivering BMC firmware solutions, collaborating with hardware teams, and ensuring software quality and security.
Responsibilities:
- Designing, implementing, and delivering innovations for managing GPU based AI servers with focus on OOB management, firmware development, server architecture and building systems for enterprise
- Leading BMC firmware design with a global team of engineers
- Designing and developing performance optimized active monitoring BMC solutions using DMTF Standards including MCTP, Redfish, SPDM and PLDM specifications
- Instrumenting code to ensure maximum code coverage, writing and automating unit tests for each implemented module and maintain detailed unit test case reports
- Providing software quality reports based on static analysis, code coverage, CPU load
- Working with security team to ensure developed code is in line with product security goals
- Working closely with hardware teams to influence hardware design and review HW architecture & schematics
- Driving definition and end to end delivery of all platforms by collaborating with internal teams, ODMs/OEMs and industry partners for AI servers
- Working with QA/Test architects to come up with proper test tools and automation for qualifying the whole system software and firmware stack
Requirements:
- Domain expertise in BMC Firmware development on X86 or ARM Platforms including BMC-BIOS communication, thermal management, power management, firmware update, device monitoring, firmware security, etc
- Solid experience of end-to-end delivery of high-end enterprise servers from definition to customer deployment
- Solid understanding of low-level interfaces between SBIOS, BMC and OS like I2C/SPI/PCIe/JTAG etc
- PCIe enumeration, IO at platform level for enterprise systems
- Experience working closely with HW teams, ODMs and vendors to introduce and support server platforms
- Experience with C/C++ development, bash/python for scripting, and debugging skills in embedded Linux operating environments
- Excellent written and oral communication skills
- Good work ethics, high sense of team-work
- Love to produce quality work and commitment to finish your tasks every single day
- Self-starter who loves to find creative solutions to exciting problems
- Bachelor's degree, Master's Degree, or a PhD; in Electrical Engineering or Computer Science (or equivalent experience) and 5+ years of experience, with demonstrated strong ability as individual contributor
- Contributor to industry standards like Open Compute, IPMI, DMTF Standards, and OpenBMC open source
- Proven record in delivering BMC for enterprise servers with OpenBMC firmware stack