Empower is seeking a Senior Manager for Site Reliability Engineering to lead multiple teams and set strategic directions for reliability across various value streams. The role involves developing talent, driving operational excellence, and partnering with various stakeholders to enhance reliability and scalability.

Responsibilities:

Lead SRE managers and senior individual contributors, develop leaders through coaching and delegation, and build a strong bench of technical and leadership talent
Drive talent reviews, succession planning, organizational design discussions, performance management, and development planning for high-potential team members
Foster a culture of operational excellence, innovation, and continuous learning across the organization
Define multi-quarter roadmaps and OKRs aligned with business objectives, and balance investments across reliability, scalability, cost, and security priorities
Develop strategies to reduce technical debt and operational toil, plan capacity and resources across teams, and lead initiatives that span multiple teams or value streams
Make build versus buy versus partner decisions for platform capabilities and support technology investment decisions within the area of responsibility
Own reliability across multiple critical systems or value streams, including SLO and SLI frameworks, operational metrics, incident management, on-call practices, and disaster recovery and business continuity capabilities
Provide technical direction on major infrastructure decisions, participate in architecture reviews, and drive adoption of best practices across teams, including infrastructure as code, GitOps, observability, CI/CD, and zero-trust security architecture
Partner with Engineering, Product Management, Security, Compliance, Finance, and executive stakeholders on reliability requirements, roadmap planning, infrastructure initiatives, and broader technology strategy
Communicate vision, strategy, progress, incident summaries, and investment needs to stakeholders at all levels, including VP and Director audiences, while influencing technical and process decisions across the organization

Requirements:

3 to 5 years of management experience or equivalent, ideally including experience managing managers
8+ years of hands-on technical experience in Site Reliability Engineering, DevOps, or Operational Excellence
Proven track record of leading engineering teams through operational challenges
Deep technical knowledge of AWS, Kubernetes, and modern infrastructure practices
Strong understanding of SRE principles and experience applying them at scale
Experience setting strategy and executing multi-quarter initiatives
Demonstrated ability to develop managers and senior engineers
Excellent communication skills with the ability to influence senior leadership
Strong judgment in balancing competing priorities and making trade-offs
Experience managing distributed or hybrid teams
Experience managing SRE organizations with multiple teams
Financial services or other highly regulated industry experience
Deep understanding of compliance frameworks such as SOC 2, PCI DSS, and FINRA
Track record of leading transformational initiatives such as cloud migrations or platform rebuilds
AWS certifications at the professional level
Experience with large-scale Kubernetes implementations
Background in Agile or Scrum practices and metrics-driven management
Experience with FinOps and cloud cost optimization
Previous experience in Fortune 500 companies or similarly complex organizations

Senior Manager Site Reliability Engineering

Key skills

About this role

Responsibilities:

Requirements: