Zillow is reimagining how people move through the real estate market and is seeking a Senior Engineering Manager for Site Reliability Engineering to lead a multidisciplinary team responsible for the infrastructure and reliability of Follow Up Boss. The role involves driving cross-org alignment, modernizing infrastructure, and enhancing developer experience while ensuring high availability and performance.

Responsibilities:

Own execution for the FUB infra & security roadmap, turning strategic goals (e.g., DB scalability, ZGCP adoption, infra cost and reliability targets) into a sequenced, realistic plan with clear milestones and measures of success
Run an exemplary planning and delivery rhythm (quarterly), including estimation, risk management, dependency mapping, and stakeholder updates across FUB+ and central platform teams
Ensure the team hits commitments with rare surprises, and when risk emerges, proactively engage partners to adjust scope, resources, or timeline with clear communication and tradeoffs
Be accountable for reliability, performance, operability, and cost of core FUB services and infrastructure (EC2, RDS/Aurora, Redis/Valkey, networking, queues, SRE tooling)
Lead the team to run a proud, low-toil on-call process: well-defined SLOs and error budgets, actionable alerting, fast incident detection/response, high-quality RCAs, and follow-through on remediation work
Drive urgent, sustained progress on database scaling and performance, including capacity management, query and schema optimization, and modernization of data infrastructure
Lead the FUB modernization strategy and execution for prioritized workloads (e.g., workers, supporting services), balancing devex wins, reliability, and risk while coordinating with central teams
Partner with principal/staff engineers to refine FUB’s service scaling strategy, ensuring clear guidance on when teams build in the monolith vs. new services, and how infra supports these choices
Raise the bar on developer environments and onboarding, reducing friction from dev boxes, tooling setup, and infra access; ensure new engineers can be productive quickly with reliable, self-service workflows
Drive faster, safer deployments by improving CI/CD (GitLab, pipelines, AMI replacements, canary/progressive delivery) and aligning with ZG best practices for trunk-based development and feature flags
Partner with product SDMs and tech leads to lower operational friction for dev teams (e.g., better runbooks, improved observability, easier infra integrations, automated guardrails and guardrails-powered AI tooling)
Lead and grow a high-performing, inclusive SRE/infrastructure/security team, set clear expectations, provide candid feedback, and manage performance
Develop technical leaders within and adjacent to the team (SREs, SDEs, security engineers, P5 ICs) through sponsorship, delegation, and stretch opportunities that expand impact beyond the immediate team
Hire, retain, and onboard talent across SRE, infra SDE, ensuring skills match the breadth of FUB infra (AWS, Terraform/Ansible, Kubernetes/ZGCP, observability, security, databases)
Be the primary technical and operational interface for FUB infra with FUB+ leadership and central Zillow platform orgs, driving alignment on priorities, tradeoffs, and architectural decisions
Contribute materially to FUB+ tech vision and infra strategy, especially around service scaling, platform adoption, and our long-term operations model (e.g., SRE ownership boundaries, infra/security shared services, cost posture)
Help identify and resolve cross-org misalignment (e.g., ownership boundaries, duplicated infra work, conflicting platform choices) and advocate for solutions that maximize Zillow-wide value, not just local optimization
Champion innovation that improves reliability, scalability, cost, and devex for multiple teams, including adoption of ZG-standard tooling and patterns and infra-focused AI agents for automation, diagnostics, and operations
Normalize AI usage within the infra team (e.g., code generation, runbook drafting, incident summarization, capacity modeling) and share successful patterns more broadly across FUB+ and platform partners
Partner with security (ZG and FUB) to ensure infra and application environments meet audit, SOC2, SOX, privacy, and app-sec requirements, with clear ownership for remediation work and sustainable controls
Forecast and manage runtime and infra costs (compute, storage, observability, networking), using tagging, dashboards, and guardrails to keep costs within budget while supporting growth

Requirements:

Proven track record as an Senior Engineering Manager or equivalent leading SRE, platform, or infrastructure teams supporting high-availability SaaS products
Experience scaling production systems and databases in a cloud environment (ideally AWS) and leading meaningful improvements to reliability, performance, and cost
Demonstrated ability to shift a team from reactive to proactive roadmap-driven execution, including setting strategy, defining metrics, and driving sustained progress across multiple quarters
Strong background in developer experience and CI/CD, with hands-on familiarity with tools such as Terraform/Ansible, GitLab, Kubernetes/ZGCP, and modern observability stacks
Experience partnering with security, database, networking, and central platform teams in a multi-org environment; able to navigate ambiguity and complex stakeholder landscapes
Demonstrated people leadership as a Senior Engineering Manager: managing senior engineers, handling performance issues with limited support, building inclusive culture, and developing leaders who can operate autonomously
Comfortable experimenting with and operationalizing AI tools in engineering workflows; curiosity and learning mindset around emerging platform and infra capabilities
Strong experience with scaling large LAMP / web applications
SaaS / Sales CRM experience is a plus

Senior Manager, Site Reliability Engineering, Follow Up Boss

Key skills

About this role

Responsibilities:

Requirements: