DigitalOcean is a leading cloud services provider, seeking a Senior Engineer 2 to join their AI Inference Optimization team. This role involves making architectural decisions to maximize performance for inference services and leading the technical strategy for performance optimizations.

Responsibilities:

Lead the technical strategy for benchmarking and performance optimizations at the inference engine and GPU kernel layers, ensuring our infrastructure extracts maximum value from every TFLOP
Engineer solutions for complex performance issues, including attention layer optimizations, memory and precision management, and advanced parallelization across multi-node GPU clusters
Proactively implement cutting-edge optimization techniques to keep DigitalOcean at the forefront of the Gen AI landscape
Act as the subject matter expert on modern GPU families (NVIDIA/AMD) and their software stacks (CUDA, ROCm, TensorRT, OpenAI Triton), advising on hardware procurement and software integration
Lead by example through high-quality code and design reviews, elevating the technical bar for the team without the administrative overhead of direct management
Partner with Product Management and TPMs to translate "theoretical hardware limits" into "shippable product features," ensuring our platform is both powerful and developer-friendly
Maintain a strong presence in the GPU infrastructure and model performance optimization communities, contributing to and integrating the best of open-source AI

Requirements:

5+ years of experience in high-performance computing or AI infrastructure, with a proven track record of solving compute utilization and memory bandwidth bottlenecks
Deep familiarity with the Gen AI (LLM, VLM, LMM) landscape, including the specific quirks and architectural requirements of major model families
Hands-on experience with attention-layer optimizations and parallelization strategies across distributed GPU environments
Comprehensive understanding of NVIDIA and AMD GPU architectures and their respective software ecosystems (CUDA, ROCm, etc.)
Extensive experience integrating, building with, and contributing to open-source software projects
Excellent system design skills, particularly related to low-level GPU programming - optimization, memory access patterns, and parallel execution
Experience acting as a technical lead, driving design and delivery through cross-functional alignment and expert-level delegation

Senior Engineer 2: Inference Optimizations

Key skills

About this role

Responsibilities:

Requirements: