NVIDIA is hiring software engineers for its GPU-accelerated deep learning software team. The Senior System Software Engineer will develop GPU-accelerated AI inference serving software and drive the convergence of the Triton Inference Server and NVIDIA Dynamo stacks, contributing to a high-performance inference platform.

Responsibilities:

Develop world-class GPU-accelerated AI inference serving software
Contribute to feature development and drive broad customer adoption
Drive the convergence of the Triton Inference Server and NVIDIA Dynamo stacks to establish a unified, high-performance inference platform
Be an active member of the open source deep learning software engineering community
Balance a variety of objectives such as building robust software designed to be deployed in production server or cloud environments, optimizing and balancing prediction throughput and latency, and developing and adopting the next generation of inference technologies

Requirements:

MS or PhD in Computer Science or relevant field (or equivalent experience)
5+ years of professional experience working on deep learning software
Excellent Rust & C++ skills, familiarity with Python, and strong programming & software design skills including debugging, performance analysis, and test design
Experience with high-scale distributed systems and ML systems
Strong communication skills and ability to work in a fast-paced, agile team environment
Prior experience with AI frameworks and engines, such as TensorRT, PyTorch, ONNX, OpenVINO, vLLM, or TRT-LLM
Knowledge of GPU memory management, cache management, or high-performance networking
Experience with distributed systems programming
Experience in contributing to a large open source project: use of GitHub, bug tracking, branching and merging code, OSS licensing issues handling patches, etc

Senior System Software Engineer - Dynamo-Triton Inference Server

Key skills

About this role

Responsibilities:

Requirements: