DigitalOcean is a cutting-edge technology company focused on simplifying cloud and AI for builders. They are seeking a Senior Engineer 2 to lead the technical strategy for AI Inference Optimization, ensuring high performance and efficiency in inference services.
Responsibilities:
- Lead the technical strategy for benchmarking and performance optimizations at the inference engine and GPU kernel layers, ensuring our infrastructure extracts maximum value from every TFLOP
- Engineer solutions for complex performance issues, including attention layer optimizations, memory and precision management, and advanced parallelization across multi-node GPU clusters
- Proactively implement cutting-edge optimization techniques to keep DigitalOcean at the forefront of the Gen AI landscape
- Act as the subject matter expert on modern GPU families (NVIDIA/AMD) and their software stacks (CUDA, ROCm, TensorRT, OpenAI Triton), advising on hardware procurement and software integration
- Lead by example through high-quality code and design reviews, elevating the technical bar for the team without the administrative overhead of direct management
- Partner with Product Management and TPMs to translate 'theoretical hardware limits' into 'shippable product features,' ensuring our platform is both powerful and developer-friendly
- Maintain a strong presence in the GPU infrastructure and model performance optimization communities, contributing to and integrating the best of open-source AI