MongoDB is a company that empowers customers and employees to innovate at the speed of the market. They are seeking a talented Site Reliability Engineer with a strong networking background to build and maintain robust infrastructure for secure communication between services. The role involves collaborating with service-owning teams, providing internal support, and participating in a 24/7 on-call rotation to ensure high availability and minimal disruption.

Responsibilities:

Participate in the development of a reliable and resilient multi-cloud globally-connected network that is crucial for MongoDB’s services
Collaborate with service-owning teams to provide internal support, addressing technical issues and offering guidance on best practices for service-to-service connectivity
Participate in a 24/7 on-call rotation to swiftly resolve issues related to network architecture and service-to-service connectivity, ensuring minimal disruption and high availability

Requirements:

Have 6+ years of experience working on software and operating distributed systems, with deep expertise in networking fundamentals and a good understanding of how the internet works, e.g. TCP/IP (including IPv6), DNS, TLS/mTLS, BGP, tunnels, overlays, and SDN principles
Possess a customer-focused mindset, driving improvements that benefit end-users
Value efficiency in processes and operations, and display a strong preference for automation over manual processes (“allergic to ops work”)
Be intimately familiar with modern cloud-based infrastructure and the network design primitives of at least one of AWS, Azure, or GCP, e.g. VPCs, subnetting, routing, VPNs, peering, private link / private service connect, and CDNs
Have a strong knowledge of service mesh and load-balancing concepts, and be eager to implement these in a multi-cloud environment

Site Reliability Engineer (Senior or Staff), Fabric

Key skills

About this role

Responsibilities:

Requirements: