Reflection AI is on a mission to build open superintelligence and make it accessible to all. The Alignment Lead will drive the alignment stack, lead research efforts on reward models, curate training data, and optimize RL pipelines to enhance model performance.
Responsibilities:
- Drive the entire alignment stack, spanning instruction tuning, RLHF, and RLAIF, to push the model toward high factual accuracy and robust instruction following
- Lead research efforts to design next-generation reward models and optimization objectives that significantly improve human preference (HP) performance
- Curate high-quality training data and design synthetic data pipelines that solve complex reasoning and behavioral gaps
- Optimize large-scale RL pipelines for stability and efficiency, ensuring rapid iteration cycles for model improvements
- Collaborate closely with pre-training and evaluation teams to create tight feedback loops that translate alignment research into generalizable model gains