Cognition is an applied AI lab building end-to-end software agents, including Devin, the first AI software engineer. The role focuses on post-training, bridging model capability and real-world effectiveness by developing training recipes, evaluations, and alignment methods for AI agents.

Responsibilities:

Post-Training Recipe Development: Iterate on the full stack of datasets, training stages, and hyperparameters that determine model behavior. Measure how choices compound across evals and production performance, not just isolated benchmarks
Evaluation Design and Integrity: Build evals that actually capture what matters. The loop never ends: define, optimize, realize the gaps, and rebuild. You'll be responsible for making numbers go up and making sure the numbers mean something
Deep Understanding: When training produces results that don't make sense, you dig until you understand why. The goal isn't just to fix it; it's to carry that understanding forward to the next problem
Alignment and Agent Behavior: Apply and advance techniques like RLHF, RLAIF, and constitutional approaches to shape how agents reason, act, and collaborate with humans in long-horizon tasks
Scaling and Exploration: Measure how performance scales with data and compute, and develop new methodologies when existing ones hit ceilings. We expect both rigor and invention

Research, Post-Training

Key skills

About this role

Responsibilities: