Odyssey is an AI lab pioneering general-purpose world models, and they are seeking an engineer to build the infrastructure that supports groundbreaking research and products. The role involves developing a low-latency model inference platform, scaling data processing infrastructure, and collaborating with researchers to optimize workflows.
Responsibilities:
- Develop and operate our low-latency model inference platform, ensuring high availability, scalability, and efficient resource utilization for Odyssey’s world models
- Engineer and scale our core data processing infrastructure (e.g., Flyte, Ray with k8s) to handle petabyte-scale datasets
- Design, build, and maintain our large-scale, GPU-based training clusters for deep learning, focusing on usability, high throughput and reliability
- Automate infrastructure provisioning, configuration, monitoring, and alerting using Infrastructure as Code (IaC) principles
- Drive performance tuning, cost optimization, and reliability improvements across the entire stack
- Collaborate closely with researchers and product developers to understand their requirements, optimize their workflows, and improve platform usability