Netflix is a leading entertainment company on a mission to entertain the world. They are seeking a Distributed Systems Engineer to provide strategic technical leadership for machine learning data products, focusing on building and optimizing distributed systems that support ML at scale.
Responsibilities:
- Lead the technical vision for ML-oriented data products
- Drive the strategy for how we produce, manage, and deliver data for feature computation, model training, online inference, feedback loops, and model evaluation across the Commerce ecosystem
- Identify cross-team opportunities to improve ML data availability, consistency, lineage, observability, and reliability
- Architect and build distributed data systems at scale
- Design and implement batch + real-time pipelines, event-driven data products, and multi-tenant distributed systems using Spark, Flink, Kafka, and other core Netflix frameworks
- Shape the next generation of ML-ready datasets powering a broad spectrum of usecases across Commerce
- Partner deeply with Platform teams
- Work closely with ML Platform to ensure feature stores, training pipelines, and inference paths are supported with correct freshness, quality, and service-level guarantees
- Influence and collaborate on platform primitives that improve the ML developer experience across the end-to-end lifecycle
- Be the connective tissue between ML needs and data/system design
- Translate ML requirements (latency, accuracy, consistency, backfillability, reproducibility) into data architecture decisions
- Proactively unblock ML partners by evolving data products, schema design, transport mechanisms, and low-latency interfaces
- Drive reliability, observability, and operational excellence
- Ensure ML-critical data systems meet high SLAs through strong observability, real-time alerting, debugging pipelines, and root-cause analysis
- Champion best practices for quality, reliability, testability, and automation across data products that operate 24x7
- Provide technical mentorship and influence
- Act as an engineering force multiplier across Commerce Data Engineering and partner teams
- Shape technical direction, design reviews, standards, and long-term architectural choices that raise the performance of the entire ecosystem
Requirements:
- You have a strong intuition about Data for ML
- You understand feature computation, training/inference needs, offline/online consistency, and how data quality, latency, and drift impact model performance
- You know how to apply your analytical skills and data engineering fundamentals to achieve the desired outcomes
- You are proficient in at least one major language on the JVM stack (e.g., Java, Scala) and SQL (any variant)
- You strive to write elegant and maintainable code, and you're comfortable with picking up new technologies
- You have hands-on distributed systems experience
- You've built and operated large-scale, low-latency pipelines and services using technologies like Spark, Flink, Kafka, or equivalent frameworks
- You are capable of designing and building well-modeled, high-quality data products and interfaces that are easy to discover, consume, and maintain
- You are an excellent cross-functional communicator
- You can translate ML, product, and engineering needs into clear technical direction and influence across teams as well as leadership on forward looking investments
- You have a strong ownership mindset
- You care deeply about reliability, observability, operational excellence, and the long-term health of the systems you build
- You are comfortable with ambiguity
- You thrive in fast-moving environments, make sound judgments with incomplete context, and elevate teams with clarity and direction
- You relate to and embody many aspects of Netflix's Culture
- You love working independently while also collaborating and giving/receiving candid feedback