Reflection AI is on a mission to build open superintelligence and make it accessible to all. They are seeking a Member of Technical Staff to build and scale distributed training systems for foundation models, collaborating closely with research teams and optimizing large-scale training workloads.

Responsibilities:

Build and scale distributed training systems that power frontier model pre-training
Work closely with research teams to design and operate large-scale training runs for foundation models
Develop infrastructure that enables efficient training across thousands of GPUs using modern distributed training frameworks
Optimize training throughput, stability, and efficiency for large model training workloads
Collaborate directly with pre-training researchers to translate experimental ideas into scalable, production-ready training systems
Improve performance of distributed training workloads through optimization of communication, memory usage, and GPU utilization
Build and maintain training pipelines that support large-scale datasets, checkpointing, and experiment iteration
Debug and resolve performance bottlenecks across distributed training stacks including model parallelism, GPU communication, and training runtime systems
Contribute to the development of systems that enable rapid experimentation and iteration on new training techniques

Member of Technical Staff - Pre-Training Infra

Key skills

About this role

Responsibilities: