GFiber is an Alphabet company that brings Google Fiber and Google Fiber Webpass internet services to homes and businesses across the United States. The Head of Network Reliability Engineering will lead the strategy for ensuring resilient and self-healing infrastructure, overseeing a multi-disciplinary organization responsible for Metro Engineering and Reliability Engineering.

Responsibilities:

Lead the Reliability Engineering and Metro Engineering functions, overseeing both the physical expansion of metro networks and the observability systems that support them
Own the end-to-end Tier 3 escalation lifecycle, working with NOC and Incident Management teams to drive a blameless engineering culture focused on systemic improvement and data-driven root cause analysis
Define the roadmap for Infrastructure-as-Code and GitOps workflows, collaborating with software and network teams to ensure configurations are version-controlled, auditable, and deployed via CI/CD
Drive the strategy for closed-loop automation by partnering with software engineering teams to implement systems that leverage real-time streaming telemetry for autonomous fault detection and remediation
Champion the elimination of operational toil; work across the organization to automate change verification and routine maintenance, allowing the NRE team to focus on high-value reliability engineering

Requirements:

Bachelor's in Computer Science, Electrical Engineering, or equivalent practical experience
10 years of experience in network engineering, with direct experience in operations, site reliability, or network reliability
Experience in IP networking (BGP, OSPF, MPLS), optical transport, and access networks (PON/Wireless)
Experience managing high-stakes incidents and designing high-availability systems
Experience managing engineering teams and driving cross-functional outcomes
Master's degree in a technical field or equivalent executive leadership experience
Experience implementing SRE/NRE frameworks (SLIs/SLOs/Error Budgets) within a production ISP or cloud environment
Strategic understanding of observability and automation tools (e.g., Prometheus, Grafana, Ansible, Terraform) and their application in an ISP environment
Experience overseeing the lifecycle of automated systems, from defining functional requirements to validating operational readiness
Experience managing complex, hybrid infrastructure—such as combined fiber and wireless networks—at a multi-regional or national scale

Head of Network Reliability Engineering

Key skills

About this role

Responsibilities:

Requirements: