MongoDB is a leading database platform that empowers innovation at the speed of the market. The Senior Site Reliability Engineer will support, maintain, and grow the Atlas platform by designing and building complex systems while ensuring high availability and minimal disruption for customers.
Responsibilities:
- Participate in the development of a reliable and resilient multi-cloud platform that hosts business critical applications for a wide & varied range of customer applications
- Collaborate with service-owning teams to provide internal support, solve technical challenges and adapt or build tooling to solve novel use cases in a generic fashion
- Participate in a 24/7 on-call rotation to swiftly resolve issues related to any disruption of our customer facing Atlas fleet, ensuring minimal disruption and high availability
Requirements:
- Have 5+ years of experience running critical systems at scale
- Value efficiency in processes and operations, and display a preference for automation over manual processes (“allergic to ops work”)
- Be familiar with a major cloud provider (AWS, Azure, or GCP) and possess the ability to build and operate systems in a multi-cloud environment
- A strong understanding of how to run a large scale Linux environment, including low level fundamentals
- Firm grasp of at least one modern programming language, beyond basic scripting (Go, Ruby, Python)
- Solid understanding of web and network protocols and standards (HTTP, TLS, DNS, etc)