Preference Model is building automated ML research engineering, focusing on creating high-quality RL training environments. They are seeking Research Engineers or Research Scientists to advance self-directed learning and optimize post-training on large language models through a blend of research and engineering responsibilities.
Responsibilities:
- Train and evaluate models on our proprietary RL environments to validate data quality, surface gaps in task coverage, and close the feedback loop between environment design and model capability
- Architect and optimize our RL training infrastructure, from training abstractions to distributed experiment management, using frameworks like Verl, OpenRLHF, or similar
- Help scale our systems to handle increasingly complex research workflows
- Design, implement, and test training environments, evaluations, and methodologies for RL agents
- Profile and optimize training runs end-to-end, from data loading through reward computation, to maximize experiment throughput and shorten the research iteration cycle