OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. As a member of the Agent Post-Training, Artifacts team, you will train frontier models to create polished, useful work products and improve agent behavior through various experiments and collaborations.

Responsibilities:

Design and run experiments that improve agentic model behavior for complex software and plugins
Own end-to-end improvements to the post-training stack, including RL, data pipelines, graders, reward signals, evals, diagnostics, and model-behavior analysis
Build evals and environments that expose the next set of model failures, then turn those failures into training data, product fixes, or new research directions
Partner with Codex and ChatGPT product teams to understand what users need and translate product signal into model improvements
Work on early-training and alignment interventions, including data mixtures, objectives, synthetic data, and eval loops that shape downstream agent behavior
Help decide which integrations, capabilities, and fixes are ready for inclusion in major model runs
Improve the machinery for large-scale training and launch: experiment velocity, reliability, observability, reproducibility, cost, latency, and production readiness
Take on cross-functional projects that touch model training, product infrastructure, and the production agent harness, such as multi-agent systems or training directly against production-like environments
Debug hard failures in shipped or near-shipped models and turn messy qualitative behavior into concrete hypotheses, experiments, and fixes

Agent Post-Training, Artifacts Research

Key skills

About this role

Responsibilities: