Cohere is dedicated to scaling intelligence to serve humanity by training and deploying frontier models for AI systems. The Senior Research Scientist in Model Evaluation will create next-generation evaluation methods and infrastructure to measure LLM progress, while working on cross-functional teams to enhance model evaluation techniques.
Responsibilities:
- Create ambitious new evaluation benchmarks that push the limits of what our models can accomplish
- Work on highly cross-functional teams to translate model feedback into trustworthy, repeatable evaluations
- Conduct research to advance the state-of-the-art in LLM evaluation methods, including training LLM judges; refining LLM-based data synthesis pipelines; and improving evaluation efficiency
- Build scalable and reusable tools for digging into model performance