Polymath is an applied research lab focused on advancing long-horizon agent capabilities through reinforcement learning. They are seeking talented researchers currently enrolled in MS or PhD programs to collaborate on a research project aimed at developing benchmarks and training autonomous agents for complex tasks.
Responsibilities:
- Identifying failure modes in frontier models
- Developing rigorous benchmarks that evaluate how well frontier agents perform on complex, realistic tasks requiring long-horizon reasoning and tool use in dynamic environments
- Training autonomous agents that can reason, plan, and act over extended time horizons