Cognition is an applied AI lab building end-to-end software agents, including Devin, the first AI software engineer. The role focuses on post-training, bridging model capability and real-world effectiveness by developing training recipes, evaluations, and alignment methods for AI agents.
Responsibilities:
- Post-Training Recipe Development: Iterate on the full stack of datasets, training stages, and hyperparameters that determine model behavior. Measure how choices compound across evals and production performance, not just isolated benchmarks
- Evaluation Design and Integrity: Build evals that actually capture what matters. The loop never ends: define, optimize, realize the gaps, and rebuild. You'll be responsible for making numbers go up and making sure the numbers mean something
- Deep Understanding: When training produces results that don't make sense, you dig until you understand why. The goal isn't just to fix it; it's to carry that understanding forward to the next problem
- Alignment and Agent Behavior: Apply and advance techniques like RLHF, RLAIF, and constitutional approaches to shape how agents reason, act, and collaborate with humans in long-horizon tasks
- Scaling and Exploration: Measure how performance scales with data and compute, and develop new methodologies when existing ones hit ceilings. We expect both rigor and invention