Preference Model is building automated ML research engineering, focusing on creating high-quality RL training environments. They are seeking Research Engineers or Research Scientists to advance self-directed learning and optimize post-training on large language models through a blend of research and engineering responsibilities.

Responsibilities:

Train and evaluate models on our proprietary RL environments to validate data quality, surface gaps in task coverage, and close the feedback loop between environment design and model capability
Architect and optimize our RL training infrastructure, from training abstractions to distributed experiment management, using frameworks like Verl, OpenRLHF, or similar
Help scale our systems to handle increasingly complex research workflows
Design, implement, and test training environments, evaluations, and methodologies for RL agents
Profile and optimize training runs end-to-end, from data loading through reward computation, to maximize experiment throughput and shorten the research iteration cycle

Research Engineer / Research Scientist

Key skills

About this role

Responsibilities: