About the Team
The Seed Multimodal Interaction and World Model team is dedicated to developing models that have boast human-level multimodal understanding and interaction capabilities. The team also aspires to advance the exploration and development of multimodal assistant products

Responsibilities
- Research and development large-scale multimodal foundation models
- Develop unified modeling frameworks that integrate video, audio, and language, with a focus on visual latent reasoning
- Explore Reinforcement Learning-based approaches to bridge understanding and generation for multimodal visual reasoning
- Collaborate with researchers to evaluate models on tasks involving world modeling, reasoning, and instruction-conditioned generation

The base salary range for this position in the selected city is $208800 - $438000 annually.

Research Scientist - Seed Multimodal Interaction and World Model

About this role