Role Overview

Manage and operate Azure-based infrastructure (IaaS / PaaS) ensuring high availability and reliability
Monitor and maintain SLA/SLO compliance for cloud services
Perform incident, problem, and change management aligned to ITIL practices
Troubleshoot and resolve complex production issues across hybrid environments (on-prem + Azure)
Ensure stability, financial and security compliance across cloud workloads
Optimize operational performance, including: Capacity management, Resource utilization, Cost control (FinOps alignment)
Design and implement cloud solutions in Azure (Landing Zones, networking, identity, compute)
Support delivery of cloud migration and modernization initiatives
Collaborate with architects and stakeholders to translate business requirements into technical solutions
Participate in solution validation, readiness, and go-live support
Ensure all solutions follow enterprise standards, governance, and security frameworks
Develop and maintain IaC templates using Bicep for Azure deployments
Standardize and automate: Infrastructure provisioning, Configuration management, Environment consistency (Dev / Test / Prod)
Promote modular, reusable, and governed IaC practices
Integrate IaC into CI/CD pipelines (Azure DevOps)
Reduce manual provisioning and enforce repeatability and compliance
Design and implement CI/CD pipelines using Azure DevOps
Automate build, release, and infrastructure deployment processes
Enable GitOps / DevOps practices across cloud platform teams
Streamline operational tasks through scripting and automation (PowerShell / CLI)
Drive adoption of: Automated testing, Version-controlled infrastructure, Continuous delivery
Implement and mature SRE principles across operations: SLO / SLA definition and tracking, Error budgets, Reliability engineering practices
Drive automation-first approach to reduce manual intervention
Perform root cause analysis (RCA) and continuous improvement initiatives
Reduce incident volume and MTTR through engineering-driven solutions
Collaborate in blameless postmortems and reliability reviews

Requirements

Strong experience in Azure cloud operations and engineering, including management of IaaS/PaaS services in enterprise environments
Proven background in hybrid infrastructure environments (on-prem + Azure) with solid understanding of networking, identity, and security
Hands-on expertise in Infrastructure as Code (IaC) using Bicep (or ARM/Terraform), with focus on standardization, automation, and governance
Experience designing and implementing CI/CD pipelines using Azure DevOps, including automation of infrastructure and application deployments
Solid knowledge of cloud observability and monitoring practices, using tools such as Azure Monitor, Log Analytics, Application Insights, and/or Grafana
Experience applying Site Reliability Engineering (SRE) principles, including SLO/SLA management, incident reduction, and automation-driven operations
Strong troubleshooting and problem-solving skills across cloud and platform services, with ability to manage complex production environments
Familiarity with ITIL-based operational processes (Incident, Problem, Change), ensuring alignment between operations and engineering
Experience driving cloud optimization initiatives, including performance, scalability, and cost (FinOps awareness)
Strong collaboration and communication skills, with ability to work across engineering, operations, and customer stakeholders.

Tech Stack

Azure
Cloud
Grafana
Terraform

Benefits

Equal Opportunity Employer
Opportunities for innovation and modernization

Senior Hybrid Cloud Platform Engineer

Key skills

About this role

Role Overview

Requirements

Tech Stack

Benefits