DigitalOcean is a leading cloud services provider, seeking a Senior Engineer 2 to join their AI Inference Optimization team. This role involves making architectural decisions to maximize performance for inference services and leading the technical strategy for performance optimizations.
Responsibilities:
- Lead the technical strategy for benchmarking and performance optimizations at the inference engine and GPU kernel layers, ensuring our infrastructure extracts maximum value from every TFLOP
- Engineer solutions for complex performance issues, including attention layer optimizations, memory and precision management, and advanced parallelization across multi-node GPU clusters
- Proactively implement cutting-edge optimization techniques to keep DigitalOcean at the forefront of the Gen AI landscape
- Act as the subject matter expert on modern GPU families (NVIDIA/AMD) and their software stacks (CUDA, ROCm, TensorRT, OpenAI Triton), advising on hardware procurement and software integration
- Lead by example through high-quality code and design reviews, elevating the technical bar for the team without the administrative overhead of direct management
- Partner with Product Management and TPMs to translate "theoretical hardware limits" into "shippable product features," ensuring our platform is both powerful and developer-friendly
- Maintain a strong presence in the GPU infrastructure and model performance optimization communities, contributing to and integrating the best of open-source AI
Requirements:
- 5+ years of experience in high-performance computing or AI infrastructure, with a proven track record of solving compute utilization and memory bandwidth bottlenecks
- Deep familiarity with the Gen AI (LLM, VLM, LMM) landscape, including the specific quirks and architectural requirements of major model families
- Hands-on experience with attention-layer optimizations and parallelization strategies across distributed GPU environments
- Comprehensive understanding of NVIDIA and AMD GPU architectures and their respective software ecosystems (CUDA, ROCm, etc.)
- Extensive experience integrating, building with, and contributing to open-source software projects
- Excellent system design skills, particularly related to low-level GPU programming - optimization, memory access patterns, and parallel execution
- Experience acting as a technical lead, driving design and delivery through cross-functional alignment and expert-level delegation