The UVA VEC is inviting PhD researchers and technical experts to contribute to an advanced AI research initiative focused on evaluating next-generation reasoning models. The role involves designing challenging STEM tasks and analyzing AI agents' reasoning through complex problems.
Responsibilities:
- Design challenging real-world STEM problems across data science, machine learning, finance, and coding
- Implement tasks within an agentic development environment using Python
- Develop reproducible benchmark tasks with executable tests and clear specifications
- Analyze model and agent behavior to identify reasoning failures
- Contribute insights that improve evaluation methodologies for frontier AI systems
- Document environments, assumptions, and experimental outcomes