Baseten powers mission-critical inference for leading AI companies, providing a platform for deploying advanced AI models. The role involves working with stakeholders to improve open-source models through post-training research, focusing on extracting insights from complex datasets and enhancing model performance.
Responsibilities:
- Design and run post-training pipelines: SFT, GRPO, DPO, RLVR, reward function engineering, and synthetic data generation
- Build task-specific training environments and evals tailored to customer domains like healthcare, code generation, and legal, spanning multi-turn tool use, sandboxed execution, and agentic workflows
- Work directly with customers to translate production data into training signal, designing reward loops from real usage patterns and handling distribution shift
- Run and analyze training experiments end-to-end: diagnose reward hacking, importance sampling drift, and advantage estimation instabilities
- Publish findings at top venues and contribute to Baseten's open-source training libraries