Nebius is leading a new era in cloud computing to serve the global AI economy. They are seeking a Senior Software Engineer to design, build, and own backend systems that power metrics and monitor large-scale infrastructure.
Responsibilities:
- Design and build services and agents that provide deep visibility into large-scale server fleets and data center engineering systems
- Evolve metrics, aggregation, and alerting pipelines, with a focus on signal quality and reliability
- Design and operate maintenance and remediation systems that enable safe, predictable fleet-wide changes and keep infrastructure healthy
- Investigate production incidents hands-on, including on-host Linux debugging, and drive root-cause fixes
- Collaborate closely with hardware, networking, and data center operations teams to improve reliability
Requirements:
- 5+ years of professional software engineering experience
- Strong production experience with Python and Go, or the ability to ramp up quickly
- Solid Linux fundamentals and comfort debugging live systems
- Ability to write reliable, maintainable code and dig into complex, ambiguous problems
- Experience building and operating production systems at scale
- Ubuntu experience, including internal tooling and packaging workflows (e.g., building Debian packages)
- CCNA (Cisco Certified Network Associate) or equivalent networking experience