WEX is a leading company in the mobility sector, seeking a Senior Director of Site Reliability Engineering to lead their engineering efforts in ensuring the resilience of their global mobility platform. This role involves strategic leadership in defining the SRE roadmap, overseeing infrastructure for millions of concurrent trips, managing incidents, and optimizing cloud efficiency while fostering a productive engineering team.

Responsibilities:

Define the multi-year SRE roadmap, pivoting from reactive firefighting to proactive, automated platform health
Oversee infrastructure that supports millions of concurrent trips across diverse geographic regions, accounting for local regulatory and latency requirements
Own the end-to-end incident lifecycle. You won't just manage the 'Big Outages'; you’ll foster a blameless culture focused on root-cause analysis (RCA) and permanent remediation
Deploy AI/ML models to analyze historical telemetry data to predict capacity 'hotspots' and system fatigue hours before they manifest
Partner with Product and Engineering VPs to balance innovation speed with reliability via strictly enforced Service Level Objectives (SLOs) and Service Level Indicators (SLIs)
Work closely with product and commercial partners to drive, prioritize, and work backwards from the customer requirements and exceed expected outcomes
Drive effective monthly, weekly, and quarterly mechanisms to plan, execute, and audit workstreams
Optimize a massive global cloud footprint (AWS/GCP/Azure), ensuring performance doesn't come at the cost of unsustainable burn
Champion 'Infrastructure as Code' (IaC) and self-service tooling so that developers can deploy safely without manual intervention
Establish a robust and clear engineering roadmap to maintain clarity and motivation for the engineering team. Maintain career growth plans and provide monthly and quarterly feedback for individuals’ continual progress
Establish measurement of metrics-driven dev productivity across Mobility SRE org
Comfortably present, influence, and communicate to the senior leadership team. Provide regular updates and insights to senior leadership on the challenges and opportunities within the Mobility domain. Effectively manage up, across, and down with tangible written strategy documents or plans

Requirements:

BS/MS in Computer Science, Engineering, or equivalent practical experience
12+ years in SRE, with at least 5 years in a senior leadership role (Director or above) managing managers
Proven track record of managing distributed systems at a 'Hyper-scale' level (e.g., millions of requests per second)
Expertise in rapid development and deployment using cloud computing platforms such as AWS or Azure
Deep understanding of Kubernetes, service mesh (Istio/Linkerd), edge computing, and global traffic management
Excellent leadership, team-building, and dynamic decision-making skills
Ability to deal with ambiguity and thrive in a fast-paced, dynamic environment
Excellent verbal and written communication skills
Experience with high-concurrency, geospatial, or real-time marketplace dynamics is a significant plus
Experience building high-performance distributed systems at internet-scale companies
Experience building credit card products, or experience developing solutions in a scheme/network
Experience building or managing fleet systems
Experience working on closed-loop card systems

Sr. Director, Site Reliability Engineering, Mobility

Key skills

About this role

Responsibilities:

Requirements: