About this role

Design and implement evaluations and tests of synthetic personas to simulate different user profiles, goals, and behaviors
Develop experimental LLM-as-a-Judge pipelines, including prompt engineering, rubric design, and automated evaluation protocols for scalable analysis of AI-generated responses
Combine concepts from Reinforcement Learning and Preference Optimization
Research and apply preference optimization techniques and reinforcement learning to continuously improve language models, analyzing impacts on quality, robustness, and alignment

Education: PhD
Fields of study: Computer Science, Computer Engineering, Information Systems, Data Science, Statistics, Applied Mathematics, or related areas in Computing, Artificial Intelligence, or Machine Learning
Programming: Python
Knowledge of Machine Learning and Deep Learning fundamentals
Familiarity with Large Language Models (LLMs) and Generative AI
Experience with AI model development libraries (PyTorch and/or Hugging Face Transformers)
Ability to read and comprehend scientific papers in English
Basic knowledge of experimental design, results analysis, and model evaluation
Experience with LLM evaluation / LLM-as-a-Judge (a plus)
Experience with RLHF, DPO, or preference optimization (a plus)
Participation in scientific research/publications in AI or Machine Learning (a plus)
Experience with code agents (Claude Code, GitHub Copilot, Codex, etc.) (a plus)

Doctoral Research Fellow – LLM, Model Evaluation, Python, Prompt Engineering

Key skills