About the Team
The mission of the Seed Speech team is to enrich interactive and creative processes through the application of multimodal speech technologies. The team focuses on the forefront of research and product development in speech and audio, music, natural language understanding, and multimodal deep learning.

We are looking for talented individuals to join our team in 2026. As a graduate, you will get opportunities to pursue bold ideas, tackle complex challenges, and unlock limitless growth. Launch your career where inspiration is infinite at ByteDance.

Successful candidates must be able to commit to an onboarding date by end of year 2026. Please state your availability and graduation date clearly in your resume.

Responsibilities
- Contribute cutting-edge research to ByteDance product evolution (e.g., Douyin, Capcut, and more) to impact billions of users worldwide.
- Work on advanced science and technology in audio processing and generation (e.g., Dialogue Systems, Audio-Video Models, Speech Synthesis, Voice Conversion, Audio Codec Learning, Audio Language Modeling, etc.)
- Research, model, design, develop and evaluate novel machine learning models and algorithms.
- Collaborate with globally based researchers and engineering teams in developing machine learning models and algorithms.

The base salary range for this position in the selected city is $208800 - $438000 annually.

Research Scientist Graduate (Foundation Model-Speech-Interaction & Learning) - 2026 Start (PhD）

Key skills

About this role