Orbifold AI is building foundational infrastructure for the next generation of physical AI, collaborating with leading robotics and world model research teams. They are seeking a Machine Learning Engineer to scale and optimize ML infrastructure, managing large volumes of multimodal data for advanced AI applications.
Responsibilities:
- Architect, build, and optimize distributed ML pipelines on Ray (Ray Core, Ray Train, Ray Serve) and PyTorch, designed for the demands of multimodal video, image, and sensor data at scale
- Profile and tune distributed training jobs and inference deployments to maximize GPU/CPU utilization and reduce latency
- Build robust abstractions and internal tools that let our researchers and product engineers deploy PyTorch models onto our Ray clusters seamlessly
- Design and maintain high-throughput video processing pipelines (e.g. FFmpeg, NVDEC/NVENC, frame-level indexing) that feed our curation, training, and evaluation workloads
- Ensure the high availability, fault tolerance, and observability of our distributed compute systems
- Build the serving infrastructure for our evaluation harnesses, verification models, and RL environments
- Collaborate with research, data, and product engineering teams to translate modeling constraints into scalable infrastructure solutions