McKesson is an impact-driven, Fortune 10 company focused on making quality healthcare more accessible and affordable. They are seeking a Senior Database Site Reliability Engineer (DB SRE) responsible for ensuring the reliability and operational maturity of Azure PostgreSQL platforms, applying SRE principles to improve database services and collaborating with various engineering teams.
Responsibilities:
- Own and continuously improve the reliability, availability, and performance of Azure PostgreSQL platforms across dev, stage, and prod
- Design, build, and operate cloud database infrastructure using Infrastructure as Code (Terraform)
- Apply SRE principles to stateful systems, including environment isolation, blast‑radius reduction, and automation-first operations
- Define and implement database observability (metrics, logs, dashboards, alerts) using enterprise monitoring tools (e.g., Datadog)
- Lead incident response for database-related production issues and participate in on‑call rotations
- Troubleshoot complex issues across performance, replication, connectivity, failover, and permissions
- Define and validate high availability, backup, restore, disaster recovery, and point‑in‑time recovery (PITR) strategies
- Enforce least‑privilege access, support audits, and ensure compliance with security and governance requirements
- Collaborate with platform, application, security, and network teams to design scalable, secure database architectures
- Provide senior technical leadership, set reliability standards, and mentor less‑experienced engineers
Requirements:
- Bachelor's degree preferred; relevant experience considered in lieu of degree
- Typically 7+ years of relevant experience in SRE, platform, or infrastructure engineering roles
- 7+ years hands-on experience operating PostgreSQL databases in cloud environments (Azure strongly preferred)
- Strong production experience (7+ years) supporting high-availability, business-critical database platforms
- Deep expertise with Infrastructure as Code, particularly Terraform
- Experience owning or participating in on-call rotations and incident response
- Strong understanding of database operations, including performance tuning, replication, backup/restore, and recovery
- Experience designing and operating database observability and monitoring solutions
- Solid knowledge of cloud security principles, including least-privilege access and audit readiness
- Proven ability to communicate effectively with technical and non-technical stakeholders
- Background as an SRE with strong database depth (not a traditional DBA role)
- Experience with CI/CD pipelines, Git/GitOps workflows
- Familiarity with Kubernetes (AKS preferred), Helm, and ArgoCD
- Experience operating stateful workloads in Azure cloud environments
- Exposure to regulated or highly controlled environments
- Broader cloud platform experience beyond databases