Thinking Machines Lab's mission is to empower humanity through advancing collaborative general intelligence. The role of post-training researchers sits at the core of their roadmap, bridging raw model intelligence and practical systems that are useful and safe for humans, focusing on human insight and machine learning.

Responsibilities:

Design and execute data collection and synthesis strategies for post-training by combining human feedback, preference data, and synthetic examples to guide model behavior
Develop pipelines and frameworks for scalable, high-quality human labeling, model-assisted labeling, and synthetic data generation
Research and model human preferences and behavior, creating data-driven methods to improve reasoning, truthfulness, and helpfulness
Iterate on evals: post-training involves a never-ending loop of defining a set of evaluations, optimizing them, and then realizing your existing evals don’t capture what matters. You’ll be responsible for both making numbers go up, and making sure the numbers are meaningful
Design and evaluate metrics and benchmarks that measure data quality, alignment, and the real-world impact of post-training interventions
Scale and explore: post-training will involve a combination of scaling the existing methodologies and developing new ones
Publish and present research that moves the entire community forward. Share code, datasets, and insights that accelerate progress across industry and academia

Research, Post-Training Data

Key skills

About this role

Responsibilities: