Nebius is leading a new era in cloud computing to serve the global AI economy. They are seeking a Site Reliability Engineer to ensure fault-tolerance, scale, and uninterrupted operations for their services while using cutting-edge technology to solve various infrastructure problems.

Responsibilities:

Ensure fault-tolerance, scale and uninterrupted operations for our services
Use cutting-edge technology to solve a variety of infrastructure problems
Implement and improve CI/CD processes

Requirements:

Proficiency in Linux systems, with expertise in Python and Bash scripting for automation
Demonstrated ability to troubleshoot complex system issues, including hardware, software and networking problems
Strong analytical and problem-solving skills, with a focus on optimizing system performance
Working proficiency in English
Desire to be involved in backend development
Experience designing, developing and running high-load distributed systems

Site Reliability Engineer

Key skills

About this role

Responsibilities:

Requirements: