HUD is building infrastructure for RL training data and evaluations for frontier AI agents. They are seeking research engineers to develop quality assurance systems for training data generated by companies using their infrastructure.
Responsibilities:
- Define and enforce quality standards for training data
- Build tooling and workflows to audit supplier-generated datasets, including sampling strategies, validation pipelines (rule-based and model-assisted), and feedback loops
- Determine if and how human-in-the-loop review workflows can be used to optimize QA
- Partner with data vendors to debug quality issues, provide actionable feedback, and improve their data generation processes
- Continuously integrate QA learnings into infrastructure tools and data vendor portal to reduce anomalies, inconsistencies, and edge cases