Home
Jobs
Saved
Resumes
Lead Site Reliability Engineer at Alteryx | JobVerse
JobVerse
Home
Jobs
Recruiters
Companies
Pricing
Blog
Jobs
/
Lead Site Reliability Engineer
Alteryx
Remote
Website
LinkedIn
Lead Site Reliability Engineer
United States
Full Time
2 weeks ago
$136,000 - $177,000 USD
Visa Sponsor
Apply Now
Key skills
Cloud
Distributed Systems
Grafana
Java
JavaScript
Kubernetes
Python
C++
C
AI
ArgoCD
Datadog
GitOps
SaaS
CI/CD
Leadership
Mentoring
About this role
Role Overview
Define and drive reliability strategy across control-plane and data-plane systems, including multi-region resilience, BCDR, and failover design
Establish and operationalize SLOs, SLAs, and error budgets, ensuring they inform planning and engineering tradeoffs
Lead initiatives that measurably improve MTTR, incident prevention, and overall service health
Own incident management end-to-end, driving systemic fixes and long-term reliability improvements beyond immediate response
Lead architecture and design reviews to ensure systems meet scalability, reliability, and cost efficiency goals
Champion automation and modernization, including AI-driven reliability improvements
Establish and enforce code quality and review standards
Lead cross-functional initiatives and align engineering with product priorities
Mentor senior engineers and act as a technical leader across teams
Requirements
6+ years leading delivery of complex, distributed systems or SaaS platforms
Strong experience with multi-region, split-plane architectures (control-plane / data-plane)
Proven track record improving SLOs, MTTR, and system reliability at scale
Proficiency in languages like Python, Java, C++, or JavaScript
Deep experience with:
Kubernetes (multi-cluster), CI/CD, and GitOps (ArgoCD)
SLO/SLA design, observability, and incident management
Infrastructure as Code and cloud platforms
Disaster recovery, resilience, and security best practices
Strong leadership skills with experience mentoring senior engineers and influencing cross-team decisions
Nice to Have
Experience with chaos engineering and large-scale reliability automation
Background in enterprise SaaS platforms or split-plane architectures
Expertise in navigating, understanding and leveraging modern Observability platforms (Datadog, Grafana, etc)
Tech Stack
Cloud
Distributed Systems
Grafana
Java
JavaScript
Kubernetes
Python
Benefits
bonus or commission
medical
retirement
financial
wellness
time off
employee discounts
Apply Now
Home
Jobs
Saved
Resumes