Netflix is a leading entertainment company dedicated to pushing the boundaries of storytelling through innovative technology. The role involves developing and maintaining software for their Kubernetes container orchestration platform, optimizing compute infrastructure, and collaborating with various teams to enhance the performance and reliability of their cloud services.
Responsibilities:
- Build and maintain the software that runs our Kubernetes container orchestration platform
- Architect and design innovative solutions to support new workloads and features, and improve the reliability and performance of existing workloads
- Develop and maintain Kubernetes and containerd customizations and plugins
- Contribute to the upstream containerd and Kubernetes projects
- Debug performance and operational problems observed with container workloads
- Continuously and proactively increase efficiency and optimize our compute platform
Requirements:
- Minimum of 5 years of experience evolving Compute infrastructure for a large organization; total 8+ years of software development experience
- Experience supporting containers and related runtimes as a service (e.g. Kubernetes kubelet, containerd, runc, NRI plugins, etc.)
- Experience debugging system performance issues in a Linux environment
- Excellent operational and troubleshooting skills
- Experience designing large-scale distributed systems, preferably a compute orchestration system like Kubernetes
- Proficiency in Go, Java, or C/C++
- Understanding of networking concepts (TCP, IPv4, sockets, host and service networking in a containerized environment)
- Ability to thrive in ambiguity in a 'context not control' environment while working with high velocity
- Excellent communication and collaboration skills
- Track record of successful contributions to open source projects
- Linux kernel development experience
- Experience managing compute infrastructure for AI/ML workloads