About this roleWe are looking for talented individuals to join our team in 2027. As a graduate, you will get opportunities to pursue bold ideas, tackle complex challenges, and unlock limitless growth. Launch your career where inspiration is infinite at our Company.
Successful candidates must be able to commit to an onboarding date by end of year 2027. Please state your availability and graduation date clearly in your resume.
Team Introduction: ByteDance Networking brings together innovative ideas and technologies from network architecture, software defined networking (SDN), network virtualization, switch software and hardware co-design, and high-speed networking, to create hyperscale data-center networking solutions that power several of the most popular apps of the world such as Douyin and TikTok which serve hundreds of millions of users around the globe.
ByteDance Networking is responsible for designing, building, and operating the global, intelligent network infrastructure to meet the requirements of high availability, scalability, and high-performance. By joining this team, you will gain marketable software development and/or network operation experiences in data center networking at massive scale.
Topic Content: With the large-scale adoption of LLMs and AI agents, traditional cloud-native infrastructure can no longer meet the ultra-high performance and elasticity requirements of AI workloads.
Network and Observability: Research intelligent fault localization and root cause analysis for large-scale AI clusters, combined with intelligent tuning of time-series databases to improve cluster stability.
This topic aims to build a next-generation AI-native infrastructure to support the deployment of LLMs and AI agents, improve resource utilization, reduce costs, support elastic scaling, and drive the technological evolution of AI infrastructure.
Responsibilities:
- Design, implementation and deployment of high-speed network technologies to support AI/LLM applications.
- Design and development of platforms/systems for monitoring, analysis and diagnosis of large scale AI/LLM network.
- Research and development of high-performance AI communication framework, network protocol stacks, and codesign optimization of host-network-application to improve the scalability, reliability and performance of AI/LLM network.
- Building next generation AI network infrastructure supporting large scale heterogeneous network hardware with innovative and deployable solutions.
The base salary range for this position in the selected city is $202160 - $368220 annually.