Zoom is dedicated to building a world-class inference infrastructure that powers all of its AI services. As an AI Software Engineer on Zoom’s AI Infra team, you will design, optimize, and scale the runtimes and services that power AI models, improving efficiency and reducing costs across Zoom’s AI stack.
Responsibilities:
- Develop and optimize AI runtimes for LLMs, ASR, and MT systems with a focus on performance and cost efficiency
- Apply GPU-level optimization techniques including CUDA, kernel fusion, and memory throughput improvements
- Implement inference optimizations such as TorchCompile, graph optimization, KV cache, and continuous batching
- Build scalable, highly available infrastructure services to support enterprise-grade AI workloads
- Optimize models for edge devices (laptops, PCs and mobile devices) as well as large-scale cloud deployments
- Continuously improve latency, throughput, and efficiency across serving pipelines
- Rapidly integrate and optimize new industry models to stay ahead in AI infrastructure