About this roleAbout the team
The Seed Infrastructures team oversees the distributed training, reinforcement learning framework, high-performance inference, and heterogeneous hardware compilation technologies for AI foundation models.
We are looking for talented individuals to join our team in 2026. As a graduate, you will get opportunities to pursue bold ideas, tackle complex challenges, and unlock limitless growth. Launch your career where inspiration is infinite at our Company.
Successful candidates must be able to commit to an onboarding date by end of year 2026. Please state your availability and graduation date clearly in your resume.
Responsibilities
- Conduct research and development on large-scale AI infrastructure to support efficient training and post-training of foundation models, multimodal LLMs, and image/video generation models.
- Design and optimize distributed training strategies, including data/model/tensor/pipeline/expert parallelism, computation–communication overlap, and large-scale GPU cluster scaling.
- Prototype and improve end-to-end reinforcement learning (RL) training systems, covering rollout generation, policy optimization, evaluation, and iterative deployment workflows.
- Build scalable and fault-tolerant infrastructure that operates reliably under dynamic workloads and heterogeneous compute environments.
- Analyze performance bottlenecks across the training stack (e.g., networking, scheduling, GPU memory management), and develop principled optimization approaches to improve throughput, efficiency, and stability.
- Develop tooling, monitoring, debugging, and observability frameworks to ensure reliability of large-scale training and RL systems.
- Collaborate with researchers and engineers on system–algorithm co-design, translating research prototypes into scalable, production-ready infrastructure systems.
The base salary range for this position in the selected city is $177688 - $341734 annually.