Design and implement AI inference and model training cloud products optimized for Kubernetes
from autoscaling inference servers to distributed training jobs across GPU fleets
Write clean, efficient, and maintainable Go code to power Kubernetes controllers, operators, and custom resources supporting AI workloads
Build APIs, CLIs, and developer tools that simplify the deployment, lifecycle management, and monitoring of AI applications
Develop features that optimize serverless container workflows for AI, ensuring fast cold starts, resource-efficient scaling, and workload isolation
Contribute to system performance, reliability, and security, with a focus on AI-specific challenges such as GPU scheduling, job orchestration, and data throughput
Stay on top of Kubernetes ecosystem advancements (e.g., K8s-native ML tooling, scheduling improvements, SIGs) and influence our product roadmap accordingly
Requirements
Strong proficiency in Go programming, with experience in Kubernetes development, including controllers and operators.
Deep understanding of Kubernetes architecture, resource management, and container orchestration.
Experience working with Kubernetes APIs and custom resources (CRDs).
Solid knowledge of cloud-native technologies and frameworks, including Docker and Helm.
Strong problem-solving skills, with a passion for tackling complex challenges in distributed systems.
Excellent communication skills and the ability to thrive in a collaborative, team-oriented environment.
Experience with Python programming language (Nice to Have)
Experience with developing AI/ML pipelines or integrating AI frameworks (e.g., TensorFlow, PyTorch) into Kubernetes (Nice to Have)
Understanding of GPU scheduling and optimization in Kubernetes environments (Nice to Have)
Knowledge of security best practices in Kubernetes, including role-based access control (RBAC) and container security (Nice to Have)
Contributions to open-source Kubernetes projects or cloud-native communities (Nice to Have)
Tech Stack
Cloud
Distributed Systems
Docker
Kubernetes
Python
PyTorch
Tensorflow
Go
Benefits
Competitive compensation
Flexible working hours
Hybrid or remote options, depending on your role
Work from anywhere in the world for up to 45 days per year
Private medical insurance for you and your family*
Extra paid vacation and sick leave days*
Support for life’s important moments and celebrations
Language courses to help you connect and grow
Modern, welcoming offices with snacks, drinks, and entertainment*