Own platform reliability across ingestion, processing, warehousing, governance, access, and business consumption — with a strong focus on automation, compliance, and operational excellence.
Lead 24x7 incident response, post‑incident reviews, and continuous improvement; maintain on‑call structures and runbooks.
Define and manage SLAs/SLOs, error budgets, and operational standards for ingestion, processing, warehousing, and consumption layers.
Implement automation for deployments, monitoring, scaling, recovery, and change management.
Establish unified observability (logs, metrics, traces, lineage) and dashboarding for platform health and capacity.
Requirements
Enterprise Cloud Data Platform Expertise
Proven experience operating multi‑platform enterprise data environments across Azure and AWS