Maintain healthy data streams including vision systems, robot telemetry, and other modalities running in production
Implement data versioning and metadata tagging to ensure complete dataset reproducibility.
Architect flexible schemas that adapt to new sensor inputs or modalities without breaking downstream training.
Ensure precise synchronization between images and action tokens or telemetry.
Develop programmatic filters to flag, discard corrupted, blurry, or incomplete data before training.
Monitor continuous datastreams in real-time for distribution shifts, missing metadata, or schema violations.
Develop systems for automatic camera calibration, automatic configuration, easy system maintainability and more.
Innovate on our hardware setup, experiment with new cameras, develop novel approaches
Work closely with the foundational AI research team (Visual-Language-Action models) to provide high quality data for the next generation of robotics
Requirements
Experience with creating and maintaining large datastreams, preferably for AI training applications
Proven track record of working with vision systems and a deep understanding of the fundamentals (e.g. camera intrinsics, extrinsics, exposure, gain, etc)
Technical leadership skills & experience
Great problem solving and proficiency in one of modern programming languages (we use Python)
Experience in designing systems spanning software and hardware components