Liquid AI is a company spun out of MIT CSAIL that builds general-purpose AI systems for various deployment targets. They are seeking a Machine Learning Research Engineer to work on data processing and quality, focusing on building and maintaining data pipelines that enhance model performance.
Responsibilities:
- Build and maintain data processing, filtering, and selection pipelines at scale
- Create pipelines for pretraining, midtraining, SFT, and preference optimization datasets
- Design synthetic data generation systems using LLMs, structured prompting, and domain-specific generators
- Design and run evaluations and ablations to measure dataset's impact on model performance
- Monitor public datasets across text, vision, and audio domains
- Collaborate with pre-training, vision, and audio teams on modality-specific data needs