Home
Jobs
Saved
Resumes
Head of SRE at Wand AI | JobVerse
JobVerse
Home
Jobs
Recruiters
Companies
Pricing
Blog
Jobs
/
Head of SRE
Wand AI
Remote
Website
LinkedIn
Head of SRE
Germany
Full Time
7 hours ago
No H1B
Apply Now
Key skills
AWS
Azure
Cloud
Kubernetes
Terraform
ML
MLOps
CI/CD
Leadership
Communication
About this role
Role Overview
Own and lead all SRE-related strategy, standards, and execution.
Embed SRE culture and operational excellence across engineering teams.
Review the current infrastructure and operational model; redesign and rebuild where needed.
Architect, deploy, and maintain scalable, secure production environments.
Define and implement SLIs, SLOs, and uptime targets.
Establish robust monitoring, alerting, and observability practices.
Design and implement incident management, RCA and postmortem processes.
Build and manage sustainable on-call frameworks and escalation models.
Automate the software delivery lifecycle to improve release predictability and safety.
Create reproducible environments and IaaC provisioning templates.
Improve system performance, availability, and reliability.
Support and productionise data platforms and ML workloads.
Partner closely with QA and Engineering leadership to improve release quality and stability.
Ensure infrastructure meets enterprise-grade security and regulatory requirements.
Hire, manage, and mentor a team of SRE engineers.
Requirements
Proven hands-on experience in Site Reliability Engineering, Production Engineering, or a similar role.
Strong hands-on expertise in cloud infrastructure (AWS or Azure preferred), IaaC (Terraform) and Kubernetes.
Experience building or maturing SRE practices within an organisation.
Demonstrated ability to improve uptime, reliability, and operational processes.
Deep understanding of CI/CD, dev exp, infrastructure-as-code, and automation.
Experience designing on-call processes and incident response frameworks.
Experience managing at least one team of SRE engineers.
Strong communication skills, with the ability to influence across teams.
Experience supporting data platforms and ML systems in production environments.
MLOps experience (model deployment, monitoring, retraining workflows).
Tech Stack
AWS
Azure
Cloud
Kubernetes
Terraform
Benefits
Health insurance
Flexible working hours
Paid time off
Professional development opportunities
Apply Now
Home
Jobs
Saved
Resumes