Nebius is leading a new era in cloud computing to serve the global AI economy. They are seeking a Site Reliability Engineer to ensure fault-tolerance, scale, and uninterrupted operations for their services while using cutting-edge technology to solve various infrastructure problems.
Responsibilities:
- Ensure fault-tolerance, scale and uninterrupted operations for our services
- Use cutting-edge technology to solve a variety of infrastructure problems
- Implement and improve CI/CD processes
Requirements:
- Proficiency in Linux systems, with expertise in Python and Bash scripting for automation
- Demonstrated ability to troubleshoot complex system issues, including hardware, software and networking problems
- Strong analytical and problem-solving skills, with a focus on optimizing system performance
- Working proficiency in English
- Desire to be involved in backend development
- Experience designing, developing and running high-load distributed systems