Temu is seeking a Site Reliability Engineer to enhance their production environment. The role involves deployment, configuration, operations, monitoring, and troubleshooting of components while collaborating with development teams to design scalable system architectures.
Responsibilities:
- Responsible for the deployment, configuration, operations, monitoring, and troubleshooting of production environment components including servers, networks, and storage
- Collaborate with development teams to design highly available and scalable system architectures
- Participate in the design and development of operations-related platforms and tools
- Assist in the daily management and maintenance of operations-related platform systems
Requirements:
- Minimum of 3 years of operations experience in a medium to large-scale internet company
- Familiarity with mainstream cloud platform services and their operational management, including compute instances, load balancers, networking, and object storage
- Proficient in high-availability technologies with strong capabilities in fault identification and troubleshooting
- Expertise in Linux system operations and proficient scripting skills
- Solid understanding of computer networking fundamentals, including TCP/IP protocols and common application-layer protocols such as HTTP/HTTPS
- Proactive, strong sense of responsibility, team-oriented, with excellent communication and learning abilities
- Experience with large-scale containerized production environments (e.g., Docker, Kubernetes)
- Experience in developing or maintaining monitoring systems, workflow automation tools, or DevOps/O&M platforms