CloudDistributed SystemsGitLabSaaSCommunicationDecision Making
About this role
Role Overview
Hire and manage a high-performing team of Site Reliability Engineers in India that lives our values.
Hold regular 1:1s with all members of your team, providing coaching and regular feedback around the individual’s performance.
Coordinate and continuously refine the team’s shift and weekend coverage model for Dedicated migrations.
Own operational execution of Dedicated Geo migrations and cutovers, including planning, pre-cutover preparation, live execution, and post-cutover validation and cleanup.
Ensure the team provides high-quality, timely responses to Geo-related escalations from Support and internal partners.
Foster technical decision making on the team, stepping in to make final decisions when necessary—especially during high-stakes migrations or incidents.
Build and maintain runbooks, guardrails, and post-cutover reviews so the team operates with rigor rather than improvisation, especially during ramp-up.
Collaborate with core Geo, Dedicated migrations, and other Infrastructure teams to identify and prioritize engineering investments that improve migration tooling and processes.
Define, track, and report on key operational metrics such as escalation volume absorbed, internal escalation rate, cutover coverage, response times, and team health signals, using them to drive continuous improvement.
Participate in the Incident Management on-call rotation to help ensure availability goals for GitLab.com are met, working with reliability engineers and development team members.
Requirements
3+ years of experience managing SRE, infrastructure, or platform engineering teams operating highly-available distributed systems at scale, ideally in a SaaS environment with customer-facing SLAs.
Demonstrated ability to lead in a remote, high-performance environment, collaborating across multiple time zones and cultures.
Experience running or significantly contributing to large-scale data migrations where customer data integrity and downtime risk must be carefully managed.
Strong infrastructure background, including cloud platforms, observability, incident response, and distributed multi-tenant architectures.
Excellent communication and interpersonal skills, with the ability to translate complex technical concepts and risk trade-offs into clear, actionable insight for both technical and non-technical stakeholders, including customers.
Strong problem-solving abilities and attention to detail, with a focus on delivering high-quality, low-risk operational outcomes in a fast-paced, dynamic environment.
Alignment with our company values and a commitment to working in accordance with those values.
Tech Stack
Cloud
Distributed Systems
Benefits
Benefits to support your health, finances, and well-being
Flexible Paid Time Off
Team Member Resource Groups
Equity Compensation & Employee Stock Purchase Plan