Support platforms serving millions of customers and billions of requests each month, ensuring availability, scalability and resiliency.
Act as a key technical contributor within PEC, working with SRE guilds to improve cloud deployments, monitoring, CI/CD pipelines and cost efficiency.
Explore and adopt new technologies and practices to advance SRE capabilities, including AI‑driven tooling and automation.
Apply hands‑on experience running high‑throughput production systems to deliver customer value beyond POCs.
Define and implement SLAs, SLOs and SLIs across software and data teams.
Improve incident management through better tooling, alerting, runbooks and automated remediation.
Act as a subject matter expert in site reliability engineering, contributing to technical discussions and fostering a culture of continuous learning across the lab.
Requirements
Hands-on proven experience of software development, testing, monitoring, and operational stability at scale.
Production experience with k8s and monitoring tools such as Datadog/Dynatrace/etc.
Proven experience and knowledge of automation and CI/CD and best practices.
Proven experience of running postmortems, defining SLAs/SLIs/SLOs and participating in support rotas.
Coding/scripting experience developed in a commercial/industry setting (python/bash).
Database knowledge, streaming and batch operations and designing APIs.
Proficient with Kubernetes (ideally microservice architectures using istio service mesh).
Extensive experience of Cloud native solutions (ideally Google Cloud).
Good understanding of cloud storage, networking, and resource provisioning.
Tech Stack
Cloud
Kubernetes
Python
Benefits
A generous pension contribution of up to 15%.
An annual bonus award, subject to Group performance.
Share schemes including free shares.
Benefits you can adapt to your lifestyle, such as discounted shopping.
30 days’ holiday, with bank holidays on top.
A range of wellbeing initiatives and generous parental leave policies.