NVIDIA is hiring a Senior System Software Engineer to work on Dynamo, focusing on GPU-accelerated deep learning software. The role involves developing open source software for inference of trained AI models on GPUs and contributing to the development of various components and features to enhance the efficiency of distributed inference workloads.
Responsibilities:
- Contribute to the development of disaggregated serving for Dynamo-supported inference engines (vLLM, SGLang, TRT-LLM) and expand to support multi-modal models for embedding disaggregation
- Innovate in the management and transfer of large KV caches across heterogeneous memory and storage hierarchies, using the NVIDIA Optimized Transfer Library (NIXL) for low-latency, cost-effective data movement
- Build new features to the Dynamo Rust Runtime Core Library and design, implement, and optimize distributed inference components in Rust and Python
- Balance a variety of objectives: build robust, scalable, high performance software components to support our distributed inference workloads; work with team leads to prioritize features and capabilities; load-balance asynchronous requests across available resources; optimize prediction throughput under latency constraints; and integrate the latest open source technology
Requirements:
- Masters or PhD or equivalent experience
- 3+ years in Computer Science, Computer Engineering, or related field
- Ability to work in a fast-paced, agile team environment
- Excellent Rust/Python/C++ programming and software design skills, including debugging, performance analysis, and test design
- Experience with high scale distributed systems and ML systems
- Prior contributions to open-source AI inference frameworks (e.g., vLLM, TensorRT-LLM, SGLang)
- Experience with GPU memory management, cache management, or high-performance networking
- Understanding of LLM-specific inference challenges, such as context window scaling and multi-model agentic and reasoning workflows
- Prior experience with disaggregated serving and multi modal models (Vision-Language models, Audio Language Models, Video Language Models)