Temu is seeking a Site Reliability Engineer to enhance their production environment. The role involves deployment, configuration, operations, monitoring, and troubleshooting of components while collaborating with development teams to design scalable system architectures.

Responsibilities:

Responsible for the deployment, configuration, operations, monitoring, and troubleshooting of production environment components including servers, networks, and storage
Collaborate with development teams to design highly available and scalable system architectures
Participate in the design and development of operations-related platforms and tools
Assist in the daily management and maintenance of operations-related platform systems

Requirements:

Minimum of 3 years of operations experience in a medium to large-scale internet company
Familiarity with mainstream cloud platform services and their operational management, including compute instances, load balancers, networking, and object storage
Proficient in high-availability technologies with strong capabilities in fault identification and troubleshooting
Expertise in Linux system operations and proficient scripting skills
Solid understanding of computer networking fundamentals, including TCP/IP protocols and common application-layer protocols such as HTTP/HTTPS
Proactive, strong sense of responsibility, team-oriented, with excellent communication and learning abilities
Experience with large-scale containerized production environments (e.g., Docker, Kubernetes)
Experience in developing or maintaining monitoring systems, workflow automation tools, or DevOps/O&M platforms

Site Reliability Engineer

Key skills

About this role

Responsibilities:

Requirements: