Figure is an AI robotics company developing autonomous general-purpose humanoid robots. They are seeking a Helix AI Engineer, Video Pretraining to lead the development of large-scale video foundation models that enable capabilities in perception, prediction, and embodied reasoning.
Responsibilities:
- Design and train large-scale video foundation models on diverse datasets spanning internet-scale video and robot-collected data
- Develop pretraining strategies that capture temporal dynamics, motion, and object interaction from raw video sequences
- Build models that learn transferable representations for downstream tasks such as perception, tracking, prediction, and control
- Explore architectures for video understanding and generation, including transformer-based and diffusion-based approaches
- Implement efficient data pipelines and training strategies for high-throughput video ingestion and large-scale distributed training
- Optimize model performance across compute, memory, and training efficiency constraints
- Collaborate closely with generative modeling, agent, and robot learning teams to integrate pretrained models into the autonomy stack
- Design evaluation frameworks and benchmarks to measure temporal understanding, prediction quality, and generalization