Reflection AI is on a mission to build open superintelligence and make it accessible to all. They are seeking a Data Quality Engineer to ensure that the data used to train their models meets high standards for quality and reliability, directly impacting model performance.
Responsibilities:
- Own upstream data quality for LLM pre-training; as a specialist or generalist across languages and modalities
- Partner closely with research and pre-training teams to translate requirements into measurable quality signals, and provide actionable feedback to external data vendors
- In addition to human-in-the-loop processes, you will design, validate, and scale automated QA methods to reliably measure data quality across large campaigns
- Build reusable QA pipelines that reliably deliver high-quality data to pre-training teams for model training
- Monitor and report on data quality over time, driving continuous iteration on quality standards, processes, and acceptance criteria