NVIDIA is a leading technology company known for its innovative work in AI and robotics. They are seeking a Senior Software Engineer to build foundational infrastructure for Robotics Research, focusing on ML productivity tooling to enhance research efficiency and GPU utilization.
Responsibilities:
- Build highly scalable, robust, and efficient CI/CD frameworks. The workload is data intensive and requires CPU/GPU heterogeneous computation
- Build world-class visualization tools for analyzing and optimizing for all our datasets and compute jobs (across 10s of thousands of GPUs)
- Develop and apply AI agents to significantly improve programming efficiency within the team, and decrease the human effort in fixing job failures
- Overall, collaborate with researchers to gather requirements, understand tooling / visualization / automation needs, and deliver full-stack solutions that move the needle with speed of light
Requirements:
- Bachelor's degree in Computer Science, Engineering, or a related field, or equivalent experience
- 12+ years of full-time industry experience in large-scale MLOps and AI infrastructure
- Strong experience in full-stack software development, with a focus on building CI/CD or visualization tools
- Proficient in both front-end and back-end programming with Python, JavaScript, SQL, or similar
- Familiar with modern web front/back end technologies like React, Node.js
- Knowledge of GPU technologies like CUDA and NCCL
- Master's or PhD's degree in Computer Science, Robotics, Engineering, or a related field
- Demonstrated Tech Lead experience, coordinating a team of engineers and driving projects from conception to deployment
- Strong experience at building and operating large-scale tooling infrastructure in production
- Strong background and curiosity in frontier AI research
- Bonus: experience with PyTorch, Ray, Kubernetes