Veeam Software is the #1 global market leader in data resilience, providing backup, recovery, and security solutions for businesses. The Senior Staff Site Reliability Engineer will set the direction for Veeam’s global SRE function, collaborating with product and platform engineering leads to enhance the architecture and reliability of the Veeam Data Cloud.
Responsibilities:
- Guide the strategic reliability roadmap across services, collaborating with Staff SREs and product engineering leadership
- Lead deep dives into service architecture and operational behaviors to identify opportunities for system-wide improvements
- Set architectural guardrails and partner with engineering leadership to design for resilience and scalability at global scale
- Mentor Staff SREs and serve as the connective tissue between their areas of expertise
- Drive alignment across teams and functions through leadership, coaching, and technical authority
- Advocate for a blameless, data-informed culture of continuous improvement
- Lead and evolve chaos engineering, resilience testing, and system validation programs
- Establish visibility mechanisms (dashboards, SLO reporting, scorecards) that track our reliability posture
- Represent the SRE discipline in executive and cross-functional settings, influencing org-level decisions
- Collaborate with platform, security, and infrastructure teams to build shared tooling and processes
- Contribute thought leadership internally and externally through presentations, white papers, and conference talks
Requirements:
- 12+ years in engineering roles, including 3+ years in a staff+ or principal-level position
- Expertise in large-scale distributed systems, cloud infrastructure, and resilience strategies
- Proven ability to set technical direction and align diverse teams around architectural goals
- Experience mentoring Staff+ engineers and guiding multi-team initiatives
- Advanced skills in systems architecture, cloud-native development, and automation tooling
- Excellent communicator able to influence technical and non-technical stakeholders alike
- Experience building a global SRE practice
- Participation in industry forums, open source, or speaking engagements
- Deep knowledge of service ownership, SLIs/SLOs, and organizational scaling challenges