Conduct literature reviews and implement state-of-the-art algorithms in RL and self-distillation.
Design and execute experiments to evaluate the effectiveness of proposed methods on code generation and agentic tasks.
Develop and maintain codebases for both theoretical modeling and practical implementations.
Collaborate with researchers to analyze results, refine methodologies, and prepare findings for publication.
Contribute to the design of mechanisms for handling large rollouts, such as summarization and hierarchical sub-agents.
Document progress, methodologies, and outcomes clearly and comprehensively.

Strong background in machine learning, particularly reinforcement learning and deep learning.
Proficiency in Python and experience with ML frameworks (e.g., PyTorch, TensorFlow).
Familiarity with LLMs and their training paradigms.
Experience with coding tasks, unit testing, or compiler tools is a plus.
Currently pursuing a Master’s or PhD in Computer Science, Machine Learning, or a related field.
Ability to work independently and manage complex projects.
Strong problem-solving and analytical skills.
Excellent communication skills for collaborating with a research team.
Prior experience with RLVR, self-distillation, or large-scale ML experiments is highly desirable.
Willingness to learn and adapt to new methodologies and tools.

A weekly lunch stipend of $75/£75 or equivalent in your local currency for lunch.
Full health and dental benefits, including a separate budget for mental health.
RRSP matching, 401K, Pension Scheme.
100% Parental Leave top-up for up to 6 months, for either parent.
Annual enrichment benefits: Arts & culture, fitness/wellness, quality time, and a workspace improvement credit. Education & learning stipend for conferences, courses, and coaching.
6 weeks of paid vacation (30 working days!)
Budget for traveling to other offices if you are remote, plus an annual company offsite.

Research Internship, Reinforcement Learning

Key skills