Role Overview

Be strongly customer focused and drive a culture within your team that recognizes and promotes the importance of high availability and reliability for Twilio’s customer facing services
Empower a team of highly skilled Engineers, motivating them to perform their best and provide support and guidance that enables them to self-organize and to achieve sustained high velocity
Lead employee career development by providing coaching and mentoring to junior engineers, while guiding senior contributors to deliver on their potential
Collaborate across teams on best practices to build, test and operate services at scale in AWS environments, enabling high performance and availability
Contribute to technical deep dives and right-sizing of the engineering investment relating to modernization initiatives and service enhancements
Collaborate with Product Managers, Architects and Product Engineering partners to develop and drive a technical roadmap for your team that is aligned with the wider Platform Engineering organization and enables the achievement of defined quarterly objectives and key results
Communicate effectively with your leaders and peers, as well as internally within your team, distilling complex thoughts and articulating concepts and project plans through written and verbal communication
Be highly data driven, leveraging metrics and Service Level Indicators and Objectives to identify gaps in systems, services and processes and lead your team to develop and implement solutions to address them
Empower your team to participate actively in post incident reviews and to identify and take ownership for learnings and follow-up actions that improve Twilio’s responsiveness to service failures
Carry out periodic audits and risk assessments to identify opportunities to improve the security and reliability of Twilio’s services
Drive initiatives that increase the use of automation to reduce TOIL and manual intervention by product engineering teams to deploy and operate their services
As part of the Engineering Management team, foster leadership principles and behaviors throughout the organization and help to groom the next generation of leaders

Requirements

5+ years of management experience, including 3+ years leading a team focused on Reliability Engineering
Proven track record managing and responding to incidents
Experience with reliability modeling in distributed systems, including failure mode analysis, chaos engineering, graceful degradation and automated recovery
Depth of operational experience with complex distributed systems, including defining, measuring and monitoring SLIs and SLOs towards high availability and reliability goals
Experience building and managing systems with industry standard deployment and operational tools and automation such as Kubernetes, Docker, Kafka and Terraform
Ability to think at scale, bringing a focus on continuous delivery methodologies from design through deployment and operations
Strong product and architectural vision and a demonstrated ability to communicate, advocate, and execute on that vision
High degree of ownership and customer obsession
Track record in building and sustaining high performing teams, with the ability to empower others, demonstrated through clear communication, mentoring and coaching
Strong influential and persuasion skills to manage Right vs Right Now engineering decisions
Excellent written and oral communication skills enabling you to articulate complex, technical material to a non-technical audience
Success at participating in multi-functional teams; naturally collaborative but decisive when needed.

Tech Stack

AWS
Distributed Systems
Docker
Kafka
Kubernetes
Terraform

Benefits

competitive pay
generous time off
ample parental and wellness leave
healthcare
a retirement savings program
and much more

Senior Engineering Manager, Reliability

Key skills

About this role

Role Overview

Requirements

Tech Stack

Benefits