Research Engineer – Reinforcement Learning (RL) Systems & Infrastructure (Seed Infra)
San Jose, California, United States of America
Full Time
3 weeks ago
$244,800 - $450,000 USD
Key skills
AI
About this role
About the Team The Seed Infrastructures team oversees the distributed training, reinforcement learning framework, high-performance inference, and heterogeneous hardware compilation technologies for AI foundation models.
Responsibilities - Design and build end-to-end reinforcement learning (RL) systems for large-scale models, covering rollout, training, evaluation, and deployment pipelines. - Develop scalable and fault-tolerant RL infrastructure that operates efficiently under dynamic workloads and heterogeneous compute environments. - Optimize distributed training performance across GPU clusters, improving throughput, resource utilization, and system stability. - Collaborate with cross-team researchers on targeted system–algorithm co-design to translate research ideas into robust, production-grade implementations. - Build tooling, monitoring, and debugging frameworks to ensure reliability and observability of large-scale RL training systems.
The base salary range for this position in the selected city is $244800 - $450000 annually.