The UVA VEC is inviting PhD researchers and technical experts to contribute to an advanced AI research initiative focused on evaluating next-generation reasoning models. The role involves designing challenging STEM tasks and analyzing AI agents' reasoning through complex problems.

Responsibilities:

Design challenging real-world STEM problems across data science, machine learning, finance, and coding
Implement tasks within an agentic development environment using Python
Develop reproducible benchmark tasks with executable tests and clear specifications
Analyze model and agent behavior to identify reasoning failures
Contribute insights that improve evaluation methodologies for frontier AI systems
Document environments, assumptions, and experimental outcomes

PhD Research Rater – AI Model Evaluation (Remote)

Key skills

About this role

Responsibilities: