MongoDB is built for change, empowering our customers and our people to innovate at the speed of the market. They are seeking an experienced Senior Site Reliability Engineer to support, maintain, and grow the Atlas platform, ensuring a reliable and resilient multi-cloud environment.
Responsibilities:
- Support, maintain and grow the Atlas platform
- Design & build complex systems, operate with autonomy and act as owner for everything you do
- Provide expertise about running systems at scale, build new tooling and automation and perform essential maintenance of the Atlas fleet
- Participate in the development of a reliable and resilient multi-cloud platform that hosts business critical applications for a wide & varied range of customer applications
- Collaborate with service-owning teams to provide internal support, solve technical challenges and adapt or build tooling to solve novel use cases in a generic fashion
- Participate in a 24/7 on-call rotation to swiftly resolve issues related to any disruption of our customer facing Atlas fleet, ensuring minimal disruption and high availability
Requirements:
- 5+ years of experience running critical systems at scale
- Value efficiency in processes and operations, and display a preference for automation over manual processes ('allergic to ops work')
- Be familiar with a major cloud provider (AWS, Azure, or GCP) and possess the ability to build and operate systems in a multi-cloud environment
- A strong understanding of how to run a large scale Linux environment, including low level fundamentals
- Firm grasp of at least one modern programming language, beyond basic scripting (Go, Ruby, Python)
- Solid understanding of web and network protocols and standards (HTTP, TLS, DNS, etc)