Netflix is a global entertainment company dedicated to pushing the boundaries of storytelling and technology. They are seeking an experienced Machine Learning Engineer to design and build systems for training and inference efficiency of Large Language Models and other media ML models, collaborating with a cross-functional team to deliver impactful ML solutions.
Responsibilities:
- Design and build scalable training and inference systems for LLMs, Multimodal LLMs, and other media ML models
- Optimize end-to-end training: data pipelines (streaming, sharding, bucketing), distributed training (parallelism strategies), and mixed precision
- Optimize inference and serving: KV cache, batching, quantization, and long-context handling
- Scale model training and inference into robust, performant systems integrated into Netflix workflows
- Act as a technical thought leader for training and inference efficiency, driving initiatives that significantly improve scalability, latency, and reliability
- Mentor and uplevel other engineers and scientists in large-scale ML systems and performance engineering
Requirements:
- Extensive experience in ML engineering for large, production-grade systems using LLMs, Multimodal LLMs, and other media ML models
- Deep hands-on expertise in training optimization: high-throughput data loading (streaming, sharding, bucketing); distributed training (parallelism strategies); GPU/accelerator optimization
- Strong experience in inference optimization: KV cache design and optimization; batching and scheduling for high-throughput, low-latency serving; quantization and/or model compression
- Proficient with PyTorch and solid software engineering fundamentals (testing, observability, performance profiling)
- Proven track record of leading ML initiatives and partnering with stakeholders to define and execute impactful roadmaps
- Exceptional communication and collaboration skills; comfortable with ambiguity and high ownership
- Netflix culture resonates with you