Develop and implement state-of-the-art reinforcement learning algorithms
Establish clear performance targets such as reward maximization and policy stability
Build, run, and monitor controlled reinforcement learning experiments
Track key performance indicators while documenting iterative results
Identify and curate high-quality simulation environments and training datasets
Systematically debug and optimize the reinforcement learning pipeline
Collaborate with cross-functional teams to integrate reinforcement learning agents into production systems
Requirements
A degree in Computer Science or related field
Ideally PhD in NLP, Machine Learning, or a related field, complemented by a solid track record in AI R&D (with good publications in A* conferences)
Proven experience with large-scale reinforcement learning experiments, including online RL techniques such as Group Relative Policy Optimization (GRPO)
Deep understanding of reinforcement learning algorithms
Strong expertise in PyTorch and relevant reinforcement learning frameworks
Practical experience in developing RL pipelines
Demonstrated ability to apply empirical research to overcome reinforcement learning challenges