Career Renew is recruiting for one of its clients, a fast-growing software company supporting and developing Hedera, an open-source, proof-of-stake public ledger. They are seeking a Senior Site Reliability Engineer to design, deploy, and ensure the reliability of multi-region infrastructure for large organizations across various sectors.

Responsibilities:

Design, build, and operate highly available, multi-region distributed systems with clear recovery strategies and tested RTO/RPO
Partner with the Head of SRE to define the reliability roadmap, platform architecture, and operational standards
Own large-scale Infrastructure as Code using Terraform, including reusable modules, multi-account patterns, and policy guardrails
Operate and scale Kubernetes environments (EKS, GKE, or AKS) using GitOps practices (ArgoCD), Helm, and strong RBAC and network policies
Build and maintain secure CI/CD pipelines, including blue/green and canary deployments, promotion and rollback strategies, and artifact integrity (SBOM, signing)
Define and improve SRE practices, including SLOs, error budgets, observability, and measurable reductions in MTTR/MTTA
Work closely with product and engineering teams to translate customer and business requirements into reliable, secure platform services
Contribute to the operational support and continuous improvement of customer-facing HashSphere deployments

Requirements:

Proven experience designing and building production-grade systems on Azure
Ability to take ambiguous requirements to structured technical solutions to delivered systems
Strong technical communication skills across engineering and non-technical stakeholders
High ownership mindset with a bias for action and accountability
Collaborative approach with a focus on building durable, scalable solutions
Azure cloud services (networking, compute, identity, security, storage)
Terraform (infrastructure as code at production scale)
Programming experience in Go and/or Python
Experience building greenfield infrastructure environments
Distributed systems, high-availability architectures, or platform engineering
CI/CD and automation tooling for infrastructure lifecycle management
Kubernetes and container orchestration
Observability tooling (Prometheus, Grafana)
Workflow/orchestration platforms (Argo, Spacelift, or similar)

Senior Site Reliability Engineer (Azure)

Key skills

About this role

Responsibilities:

Requirements: