OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. They are seeking a Backend Software Engineer to design and build an evals infrastructure that measures the quality of OpenAI’s support automation, collaborating closely with data science and research partners.

Responsibilities:

Design eval pipelines that are reliable, reproducible, and extendable
Build the infrastructure for continuous eval monitoring frameworks (regression/drift monitoring, building robust golden datasets) along with feedback loops that ultimately strengthen support automation
Design, build, and maintain backend services and APIs to support intelligent automation and knowledge systems
Integrate and structure data across internal platforms, transforming it into formats optimized for use by downstream systems and AI workflows
Collaborate closely with data, research, and engineering teams to integrate OpenAI models into high-leverage workflows
Own the full development lifecycle of new backend systems and internal platform capabilities
Build with scale and maintainability in mind, while rapidly iterating on new ideas

Requirements:

4+ years of backend engineering experience at product-driven companies (excluding internships)
Proficiency in backend technologies. Our tech stack includes Python, FastAPI, and Postgres
Experience designing and scaling distributed systems, APIs, or data processing pipelines
Have experience building AI agents or applications, including designing evals and improving performance through prompting or scaffolding
Are familiar with evaluation methods for LLMs and have worked with patterns like multi-agent workflows, tool use, or long context
Experience creating production evals and/or measuring performance of ML/LLM models at scale
A pragmatic mindset. You're comfortable shipping iteratively while building toward a long-term vision

Backend Software Engineer (Evals)

Key skills

About this role

Responsibilities:

Requirements: