OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. The role involves building and operating storage services that underpin OpenAI’s research infrastructure, focusing on object storage systems, cross-region data movement, and lifecycle management capabilities.
Responsibilities:
- Build and operate storage services that underpin OpenAI’s research infrastructure
- Develop object storage systems across cloud and in-house environments
- Build systems for cross-region data movement, replication, and recovery
- Design lifecycle management capabilities that keep data durable, available, and cost-effective
- Evolve the federation layer that unifies multiple backend systems behind a simple interface
- Improve performance, reliability, and operational excellence across the platform
- Collaborate closely with researchers and infrastructure teams to support rapidly evolving workloads
Requirements:
- Have experience building or operating distributed systems in production
- Have worked on storage infrastructure, object stores, distributed filesystems, or other data-intensive backend systems
- Enjoy owning infrastructure end to end, including debugging and long-term reliability improvements
- Write strong production code, ideally in Rust or another systems-oriented language
- Are comfortable working with Kubernetes-based systems
- Have experience with tools such as Terraform, Grafana, or similar infrastructure and observability tooling