Develop and implement state-of-the-art reinforcement learning algorithms designed to optimize decision-making processes in both simulated and real-world settings
Establish clear performance targets such as reward maximization and policy stability
Build, run, and monitor controlled reinforcement learning experiments
Identify and curate high-quality simulation environments and training datasets that are tailored to specific domain challenges
Systematically debug and optimize the reinforcement learning pipeline
Collaborate with cross-functional teams to integrate reinforcement learning agents into production systems
Requirements
A degree in Computer Science or related field
Ideally PhD in NLP, Machine Learning, or a related field, complemented by a solid track record in AI R&D (with good publications in A* conferences)
Proven experience with large-scale reinforcement learning experiments, including online RL techniques such as Group Relative Policy Optimization (GRPO)
Deep understanding of reinforcement learning algorithms
Strong expertise in PyTorch and relevant reinforcement learning frameworks
Practical experience in developing RL pipelines, from simulation and online training to post-training evaluation
Proficient in designing robust evaluation frameworks and iterating on algorithmic innovations