GFiber is an Alphabet company that brings Google Fiber and Google Fiber Webpass internet services to homes and businesses across the United States. The Head of Network Reliability Engineering will lead the strategy for ensuring resilient and self-healing infrastructure, overseeing a multi-disciplinary organization responsible for Metro Engineering and Reliability Engineering.
Responsibilities:
- Lead the Reliability Engineering and Metro Engineering functions, overseeing both the physical expansion of metro networks and the observability systems that support them
- Own the end-to-end Tier 3 escalation lifecycle, working with NOC and Incident Management teams to drive a blameless engineering culture focused on systemic improvement and data-driven root cause analysis
- Define the roadmap for Infrastructure-as-Code and GitOps workflows, collaborating with software and network teams to ensure configurations are version-controlled, auditable, and deployed via CI/CD
- Drive the strategy for closed-loop automation by partnering with software engineering teams to implement systems that leverage real-time streaming telemetry for autonomous fault detection and remediation
- Champion the elimination of operational toil; work across the organization to automate change verification and routine maintenance, allowing the NRE team to focus on high-value reliability engineering
Requirements:
- Bachelor's in Computer Science, Electrical Engineering, or equivalent practical experience
- 10 years of experience in network engineering, with direct experience in operations, site reliability, or network reliability
- Experience in IP networking (BGP, OSPF, MPLS), optical transport, and access networks (PON/Wireless)
- Experience managing high-stakes incidents and designing high-availability systems
- Experience managing engineering teams and driving cross-functional outcomes
- Master's degree in a technical field or equivalent executive leadership experience
- Experience implementing SRE/NRE frameworks (SLIs/SLOs/Error Budgets) within a production ISP or cloud environment
- Strategic understanding of observability and automation tools (e.g., Prometheus, Grafana, Ansible, Terraform) and their application in an ISP environment
- Experience overseeing the lifecycle of automated systems, from defining functional requirements to validating operational readiness
- Experience managing complex, hybrid infrastructure—such as combined fiber and wireless networks—at a multi-regional or national scale