Drive innovation in reinforcement learning approaches for advanced models.
Optimize decision-making and adaptive behavior to deliver enhanced intelligence.
Curate specialized simulation environments and training datasets.
Strengthen baseline policy performance, resolving bottlenecks in the reinforcement learning process.
Collaborate with cross-functional teams to integrate reinforcement learning agents into production systems.
Requirements
A degree in Computer Science or related field.
Ideally PhD in NLP, Machine Learning, or a related field, complemented by a solid track record in AI R&D (with good publications in A* conferences).
Proven experience with large-scale reinforcement learning experiments, including online RL techniques such as Group Relative Policy Optimization (GRPO), is essential.
Deep understanding of reinforcement learning algorithms is required, including state-of-the-art online RL methods and other gradient-based optimization approaches like policy gradients, actor-critic, and GRPO.
Strong expertise in PyTorch and relevant reinforcement learning frameworks is a must.
Practical experience in developing RL pipelines, from simulation and online training to post-training evaluation and deploying RL-based solutions in production environments is expected.
Demonstrated ability to apply empirical research to overcome reinforcement learning challenges such as sample inefficiency, exploration-exploitation tradeoffs, and training instability.
Proficient in designing robust evaluation frameworks and iterating on algorithmic innovations to continuously push the boundaries of RL agent performance.