Netflix is a leading entertainment company focused on pushing the boundaries of storytelling through technology. They are seeking a Senior Engineering Manager to lead the Model Inference & Serving pillar within the ML Platform organization, responsible for setting the strategic vision and execution across multiple teams to deliver core model serving infrastructure.
Responsibilities:
- Define and communicate the pillar’s multi-year vision, technical strategy, and roadmap
- Anticipate future platform and business needs, especially as ML architectures and use cases evolve
- Drive the transition from legacy, domain-based serving to a unified, modular, and domain-agnostic serving platform
- Manage and mentor engineering managers and technical leads; build a strong leadership bench
- Foster a culture of high performance, candor, innovation, and inclusion, aligned with Netflix’s values
- Attract, hire, and retain outstanding talent across the pillar
- Set and uphold technical standards for reliability, scalability, and performance across all teams
- Oversee development of foundational serving infrastructure: real-time/batch inference, frameworks, experimentation, control plane, and tooling
- Ensure robust support for diverse model types (deep learning, LLMs, bandits, etc.), hardware targets (CPU/GPU), and SLAs
- Own operational health and reliability at scale, including observability, SLOs, and incident response
- Build and maintain strong partnerships with ML practitioners, product engineering, infrastructure, and platform teams
- Represent the Model Serving pillar to Netflix senior leadership, clearly communicating the vision, progress, and priorities
- Influence and drive alignment on platform direction, investment, and priorities
Requirements:
- Proven success managing multiple managers in high-scale ML infrastructure/platform environments
- 10+ years of technical experience, with 5+ years in engineering management roles
- Deep expertise in ML model serving, distributed systems, and high-scale production environments
- Strategic thinking with a track record of delivering complex, cross-team initiatives
- Excellent communication and stakeholder management skills
- Experience driving organizational change and leading through ambiguity
- Experience with modern ML frameworks (e.g., PyTorch, TensorFlow), inference engines (e.g., Triton, vLLM), and experimentation platforms is a strong plus
- MS/PhD in Computer Science, Engineering, or related field, or equivalent experience preferred