Thinking Machines Lab's mission is to empower humanity through advancing collaborative general intelligence. The role of post-training researchers sits at the core of their roadmap, bridging raw model intelligence and practical systems that are useful and safe for humans, focusing on human insight and machine learning.
Responsibilities:
- Design and execute data collection and synthesis strategies for post-training by combining human feedback, preference data, and synthetic examples to guide model behavior
- Develop pipelines and frameworks for scalable, high-quality human labeling, model-assisted labeling, and synthetic data generation
- Research and model human preferences and behavior, creating data-driven methods to improve reasoning, truthfulness, and helpfulness
- Iterate on evals: post-training involves a never-ending loop of defining a set of evaluations, optimizing them, and then realizing your existing evals don’t capture what matters. You’ll be responsible for both making numbers go up, and making sure the numbers are meaningful
- Design and evaluate metrics and benchmarks that measure data quality, alignment, and the real-world impact of post-training interventions
- Scale and explore: post-training will involve a combination of scaling the existing methodologies and developing new ones
- Publish and present research that moves the entire community forward. Share code, datasets, and insights that accelerate progress across industry and academia