Twilio is shaping the future of communications by delivering innovative solutions to businesses and empowering developers worldwide. The Senior Cloud Software Engineer will lead the design and implementation of scalable cloud infrastructure, ensuring high availability and reliability of services that handle billions of requests weekly.
Responsibilities:
- Lead the execution end-to-end: Design and lead the implementation of scalable, high-availability cloud infrastructure, moving beyond feature work to drive long-term platform strategy while championing Twilio’s "progress over perfection" mindset to deliver continuous, high-impact shipments
- Operational Excellence: Own the reliability of services handling billions of weekly requests, setting the standard for operational best practices and incident response
- Infrastructure Strategy: Drive the evolution of our Infrastructure as Code (IaC) patterns using Terraform, ensuring modularity, security, and reusability across the organization. Always aim for a self -service approach such that it keeps the team’s TOIL at minimum
- Continuous optimization: Continuously looks for optimizations in our pipelines/releases/deliveries that balance rapid deployment with rigorous safety checks
- Technical Mentorship and Reviews: Actively mentor L1 and L2 engineers as well as help with code reviews, design docs, and pair programming to foster a culture of technical excellence
- Cross-Functional Influence: Collaborate with Cross teams, Product and Engineering leadership to align technical roadmap, debt reduction and new feature deliveries
- Continuous Innovation: Be an owner and continuously research and prototype to optimise Twilio’s API infrastructure to provide best in class service to the customers
Requirements:
- 5+ years of professional experience in Cloud, DevOps, or Site Reliability Engineering (SRE), with deep proficiency in Python, Java, Go or another other language of choice
- Architectural Depth: Proven track record of designing and deploying complex AWS cloud-native solutions (e.g., CloudFront, Multitenancies, DNS, Caching strategies, WAF, Lambda, S3, developing hosting solutions, etc) at scale
- IaC Expert: Advanced experience with Terraform, including writing custom providers or managing state at scale across multiple environments
- System Reliability: Deep understanding of SLIs, SLOs, Golden signals, Error Budgets and establishing monitoring strategies; in depth experience using Datadog, Grafana, or Athena to drive data-informed engineering decisions
- Strong background in microservices architecture, specifically regarding traffic routing, rate limiting, and service discovery
- Demonstrated ability to lead technical projects from conception to completion, navigating trade-offs and communicating complex technical concepts to non-technical stakeholders
- Experience implementing security-at-scale, including WAF management, DDoS mitigation, and Zero Trust architectures
- Platform Engineering previous experience, passion for building 'Internal Developer Platforms' that abstract infrastructure complexity for product teams
- A proven track record of contributing to the cloud-native or networking ecosystem, also has contributions to public Terraform providers or similar open-source initiatives