Reflection is on a mission to build open superintelligence and make it accessible to all. They are seeking a Data Quality Engineer to ensure the quality and reliability of data used in training their AI models, working closely with pre-training teams to establish measurable quality standards and automate quality assurance processes.
Responsibilities:
- Own upstream data quality for LLM pre-training; as a specialist or generalist across languages and modalities
- Partner closely with research and pre-training teams to translate requirements into measurable quality signals, and provide actionable feedback to external data vendors
- In addition to human-in-the-loop processes, you will design, validate, and scale automated QA methods to reliably measure data quality across large campaigns
- Build reusable QA pipelines that reliably deliver high-quality data to pre-training teams for model training
- Monitor and report on data quality over time, driving continuous iteration on quality standards, processes, and acceptance criteria