Baseten powers mission-critical inference for leading AI companies, providing a platform for deploying advanced AI models. The role involves working with stakeholders to improve open-source models through post-training research, focusing on extracting insights from complex datasets and enhancing model performance.

Responsibilities:

Design and run post-training pipelines: SFT, GRPO, DPO, RLVR, reward function engineering, and synthetic data generation
Build task-specific training environments and evals tailored to customer domains like healthcare, code generation, and legal, spanning multi-turn tool use, sandboxed execution, and agentic workflows
Work directly with customers to translate production data into training signal, designing reward loops from real usage patterns and handling distribution shift
Run and analyze training experiments end-to-end: diagnose reward hacking, importance sampling drift, and advantage estimation instabilities
Publish findings at top venues and contribute to Baseten's open-source training libraries

Post-Training Applied Researcher

Key skills

About this role

Responsibilities: