Netflix is a company that aims to entertain the world through innovative storytelling and technology. They are seeking a Senior Distributed Systems Engineer to design, build, and operate the next generation of experimentation and feature flag infrastructure, ensuring high availability and performance of critical services.
Responsibilities:
- Build and evolve critical experimentation and feature flag services
- Design and implement high-scale, low-latency services for experiment allocation and feature flag evaluation
- Advance core distributed systems for decentralized allocation, rules evaluation, and real-time decisioning
- Own reliability and performance
- Participate in on-call, lead incident response, and drive long-term reliability improvements
- Instrument services with rich observability (metrics, logs, traces) and continuously tune for resilience, performance, and scalability
- Shape data and integration surfaces
- Collaborate with teams using technologies like Flink, Spark, Elasticsearch, and Druid to ensure experimentation data is correct, timely, and usable
- Define clear data and API contracts for consumers and pipelines
- Partner with product engineering teams
- Deeply understand experimentation and rollout workflows across Netflix
- Simplify and improve the developer experience for configuring, launching, and monitoring experiments and feature flags
Requirements:
- You've built and operated scalable, reliable backend services and understand how to apply core algorithms, and systems design to real-world distributed systems
- You write high-quality code in Java or another JVM language and have owned production services end-to-end: monitoring, on-call, debugging, and systematic performance and reliability improvements
- You work effectively with other senior engineers and cross-functional partners, bring clarity to ambiguity, drive decisions, and keep a sharp focus on delivering value to internal users and stakeholders