DigitalOcean is a cutting-edge technology company seeking a Senior Engineer 2 to join their GPU Kernel and Performance team. The role involves optimizing GPU kernels and enhancing the performance of inference services for large models.
Responsibilities:
- Design and implement high-performance GPU kernels using Triton and CUDA C++
- Develop and deploy state-of-the-art quantization techniques (FP8, INT8, and experimental FP4) to double throughput without losing accuracy
- Optimize memory access patterns (SRAM vs. HBM3e) to eliminate bottlenecks in long-context attention mechanisms
- Implement the latest architectural breakthroughs like FlashAttention-4 and TileLang directly into our production stack
Requirements:
- Deep understanding of GPU architectures (SMs, Warp scheduling, Tensor Cores)
- Expert-level Triton or CUDA
- Strong grasp of linear algebra and how it maps to parallel hardware
- A track record of optimizing kernels to achieve >80% of theoretical hardware peak performance
- If you've contributed to the Triton compiler or wrote custom CUDA kernels for a major LLM