SambaNova is a pioneering company in the generative AI space, providing a full-stack platform optimized for various organizations. The role involves optimizing and scaling advanced foundation models, bridging deep learning and systems performance to achieve exceptional AI inference performance.

Responsibilities:

Bring up and optimize cutting-edge foundation models (e.g., DeepSeek, Llama, Qwen, and others) on the SambaNova platform through the SambaNova software stack
Profile and enhance model performance across compiler, runtime, and hardware layers to achieve SOTA throughput and latency
Collaborate with machine learning, compiler, runtime, and hardware teams to deliver co-designed, high-performance AI applications
Integrate the latest advances in model architecture, quantization, scheduling, and memory optimization from both academia and industry
Develop robust, scalable, and efficient end-to-end inference solutions aligned with customer needs
Identify performance bottlenecks and propose dataflow or scheduling optimizations for both single-node and distributed systems

Requirements:

Bachelor's or higher degree in computer science, electrical engineering, or a related field (e.g., applied mathematics, physics, or statistics)
3+ years of experience in one or more of the following areas: Deep learning model development and performance optimization, Compiler, runtime, or kernel-level optimization, Software–hardware co-design or systems performance tuning
Proficiency in Python or C++, with strong foundations in algorithms, data structures, and numerical computing
Experience with at least one major ML framework — PyTorch, TensorFlow, or JAX
Demonstrated ability to analyze and optimize performance in real-world ML pipelines
Hands-on experience with LLM or multimodal model training and inference
Background in large-scale distributed training, continuous batching, and high-throughput inference systems
Familiarity with quantization, graph optimization, kernel fusion, and model partitioning
Experience with frameworks such as DeepSpeed, Megatron, vLLM, or TensorRT
Strong GPU programming skills (CUDA, Triton, or OpenCL); experience with cuDNN, cuBLAS, or similar libraries is a plus
Knowledge of memory hierarchy optimization, caching, and scheduling for large-scale model execution
Publication record or open-source contributions in ML systems or performance optimization is a plus

Senior AI Systems Performance Engineer

Key skills

About this role

Responsibilities:

Requirements: