Execute a dedicated work plan to build frameworks that evaluate the performance, safety, and alignment of RL agents
Use Bayesian ML models (GPs, BNNs) to create metrics for model confidence and risk
Design and set up the debugging and automated testing frameworks required to evaluate non-deterministic systems
Perform "red-team" tests and benchmarks on models using Trust Region methods (PPO) and RL from Human Feedback (RLHF)
Work across the entire stack, from environment interfacing to policy optimization, with the opportunity to grow into Multi-Agent RL (MARL) technologies
Requirements
Strong proficiency in Python, NumPy, and PyTorch
Background in ML theory, Mathematics, or Physics
Experience with Bayesian ML models (e.g., Gaussian Processes, Bayesian Neural Networks)
Practical experience or familiarity with Trust Region methods (PPO) and RL from Human Feedback (RLHF)
Proven ability in debugging and setting up automated testing frameworks
Tech Stack
Numpy
Python
PyTorch
Benefits
Equal Opportunity Employer
Support diverse cultures, perspectives, skills and experiences
Senior AI Scientist – Reinforcement Learning at Resaro | JobVerse