DigitalOcean is a cutting-edge technology company seeking a Senior Engineer 2 to join their GPU Kernel and Performance team. The role involves optimizing GPU kernels and enhancing the performance of inference services for large models.

Responsibilities:

Design and implement high-performance GPU kernels using Triton and CUDA C++
Develop and deploy state-of-the-art quantization techniques (FP8, INT8, and experimental FP4) to double throughput without losing accuracy
Optimize memory access patterns (SRAM vs. HBM3e) to eliminate bottlenecks in long-context attention mechanisms
Implement the latest architectural breakthroughs like FlashAttention-4 and TileLang directly into our production stack

Requirements:

Deep understanding of GPU architectures (SMs, Warp scheduling, Tensor Cores)
Expert-level Triton or CUDA
Strong grasp of linear algebra and how it maps to parallel hardware
A track record of optimizing kernels to achieve >80% of theoretical hardware peak performance
If you've contributed to the Triton compiler or wrote custom CUDA kernels for a major LLM

Senior Engineer 2: GPU Kernel and Performance

Key skills

About this role

Responsibilities:

Requirements: