Nebius is a leader in cloud infrastructure for the AI economy, building a full-stack AI cloud platform for developers and enterprises. They are seeking a Senior Software Engineer to design and develop a large-scale LLM training platform, maintain ML infrastructure, and improve job scheduling strategies.
Responsibilities:
- Designing and developing LLM training platform
- Maintaining our ML infrastructure, ensuring optimal performance, scalability and reliability
- Improving job scheduling strategies to minimize resource fragmentation
Requirements:
- 5+ years of professional software development experience
- Strong software engineering skills (we mostly use Python and Go)
- Proficiency in contemporary software engineering approaches, including CI/CD, version control and unit testing
- Experience with developing web services
- A commitment to maintaining extreme rigor in all job-related activities
- Previous experience working with language models or other similar NLP technologies
- A track record of building and delivering products (not necessarily ML-related) in a dynamic startup-like environment
- Strong engineering skills, including experience in developing large distributed systems or high-load web services
- Open-source projects that showcase your engineering prowess