Runway is building AI to simulate the world through merging art and science. They are seeking a Dataset Engineer to help curate, build, and optimize datasets for model training, requiring strong machine learning skills and experience with large-scale datasets.
Responsibilities:
- Develop and maintain large-scale, multimodal datasets for training and evaluating models
- Optimize models for data preprocessing tasks
- Create and run evaluations and benchmark analyses for datasets and models
- Implement fast iteration cycles and feedback loops to continuously improve model datasets
- Work with a world-class research team to push the boundaries of content creation
- Evaluate new datasets and models for upstream data tasks that feed into our products
Requirements:
- 5+ years of relevant experience in machine learning or dataset engineering, ideally with multimodal datasets
- Experience with running and optimizing models offline at large scale
- Excellent data modeling skills and experience with data curation
- Proficiency in model finetuning and optimization for data preprocessing
- Strong data analysis and SQL skills
- Experience in creating evaluations and running benchmark analyses
- Solid knowledge of at least one machine learning framework (e.g. PyTorch, JAX, TensorFlow)
- Very strong programming skills and ability to write clean and maintainable code
- Deep interest in building human-in-the-loop systems for creativity
- Ability to rapidly prototype solutions and iterate on them with tight product deadlines
- Strong familiarity with tools such as Ray, Kubernetes, Airflow, Prefect
- Strong communication, collaboration, and documentation skills