Develop and implement state-of-the-art reinforcement learning algorithms designed to optimize decision-making processes in both simulated and real-world settings
Establish clear performance targets such as reward maximization and policy stability
Build, run, and monitor controlled reinforcement learning experiments
Track key performance indicators while documenting iterative results and comparing outcomes against established benchmarks
Identify and curate high-quality simulation environments and training datasets that are tailored to specific domain challenges
Systematically debug and optimize the reinforcement learning pipeline by analyzing both computational efficiency and learning performance metrics
Collaborate with cross-functional teams to integrate reinforcement learning agents into production systems
Requirements
A degree in Computer Science or related field
Ideally PhD in NLP, Machine Learning, or a related field, complemented by a solid track record in AI R&D (with good publications in A* conferences)
Proven experience with large-scale reinforcement learning experiments, including online RL techniques such as Group Relative Policy Optimization (GRPO)
Deep understanding of reinforcement learning algorithms is required, including state-of-the-art online RL methods and other gradient-based optimization approaches like policy gradients, actor-critic, and GRPO
Strong expertise in PyTorch and relevant reinforcement learning frameworks
Practical experience in developing RL pipelines, from simulation and online training to post-training evaluation and deploying RL-based solutions in production environments