MongoDB is built for change, empowering our customers and our people to innovate at the speed of the market. They are seeking an experienced Senior Site Reliability Engineer to support, maintain, and grow the Atlas platform, ensuring a reliable and resilient multi-cloud environment.

Responsibilities:

Support, maintain and grow the Atlas platform
Design & build complex systems, operate with autonomy and act as owner for everything you do
Provide expertise about running systems at scale, build new tooling and automation and perform essential maintenance of the Atlas fleet
Participate in the development of a reliable and resilient multi-cloud platform that hosts business critical applications for a wide & varied range of customer applications
Collaborate with service-owning teams to provide internal support, solve technical challenges and adapt or build tooling to solve novel use cases in a generic fashion
Participate in a 24/7 on-call rotation to swiftly resolve issues related to any disruption of our customer facing Atlas fleet, ensuring minimal disruption and high availability

Requirements:

5+ years of experience running critical systems at scale
Value efficiency in processes and operations, and display a preference for automation over manual processes ('allergic to ops work')
Be familiar with a major cloud provider (AWS, Azure, or GCP) and possess the ability to build and operate systems in a multi-cloud environment
A strong understanding of how to run a large scale Linux environment, including low level fundamentals
Firm grasp of at least one modern programming language, beyond basic scripting (Go, Ruby, Python)
Solid understanding of web and network protocols and standards (HTTP, TLS, DNS, etc)

Site Reliability Engineer (Senior or Staff), Atlas

Key skills

About this role

Responsibilities:

Requirements: