OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. In this role, you will help build systems that convert large-scale infrastructure capacity into measurable, reliable token throughput for OpenAI workloads, working across performance benchmarking, tokenomics, model porting, and operational monitoring.
Responsibilities:
- Develop systems and tooling to measure, monitor, and improve token throughput across first-party and partner-owned compute environments
- Support performance benchmarking, tokenomics analysis, and model porting across heterogeneous infrastructure environments
- Build tooling to integrate external or partner infrastructure into OpenAI’s internal compute, observability, and workload management systems
- Develop and monitor operational metrics including billing, usage, SLAs, utilization, reliability, and throughput
- Identify bottlenecks across hardware, networking, software, and workload enablement that prevent capacity from becoming productive tokens
- Partner with compute, infrastructure, networking, finance, and operations teams to translate raw capacity into usable workload-serving capacity
- Build dashboards, automation, and reporting systems that provide clear visibility into TaaS capacity, performance, and business outcomes
Requirements:
- Strong software engineering background with experience building systems, tooling, automation, or infrastructure platforms
- Experience working across compute infrastructure, distributed systems, performance engineering, or production operations
- Ability to reason about token throughput, utilization, benchmarking, infrastructure efficiency, and workload performance
- Comfortable integrating external systems or partner environments into internal infrastructure stacks
- Strong analytical and debugging skills across hardware, networking, software, and operational domains
- Experience with GPU clusters, AI infrastructure, performance benchmarking, or workload optimization
- Familiarity with model porting, inference/training workloads, token economics, or compute efficiency analysis
- Experience building monitoring systems for billing, usage, SLAs, utilization, or infrastructure reliability
- Background in systems engineering, infrastructure software, observability, distributed systems, or platform engineering