Yahoo is a technology company that connects brands and partners with a vast audience. They are seeking a strategic Senior Engineering Manager to lead their Tooling & Reliability Platforms team, focusing on building modern, efficient platforms and managing a team of engineers responsible for incident management and reliability tools.

Responsibilities:

Manage and grow a high-performing team
Identify and implement AI-driven efficiencies in the product lifecycle to accelerate platform delivery and engineering productivity
Treat the reliability stack as a product
Define the roadmap for the Incident Management platform, ensuring these tools reduce cognitive load for hundreds of service teams by replacing manual investigation steps with AI-assisted workflows
Drive the integration of GenAI and SRE Agents into production environments
Establish frameworks for validating AI-generated incident summaries and hypothesis generation to ensure accuracy and prevent automated hallucinations
Define the vision for the next generation of Resilience Engineering, focusing on building services that make products inherently resilient through automated alert diagnostics and self-healing systems
Act as a high-leverage partner to our key vendors, holding them accountable for roadmap delivery and ensuring their features align with our team vision

Requirements:

Bachelor's degree in Computer Science, Engineering, or a related field, or equivalent practical experience
5+ years of experience leading SRE or DevOps teams in a high-scale, cloud-native environment
Strong background in Software Engineering (Python, Go, or Java) and Infrastructure-as-Code
Deep familiarity with incident management and AIOps tools (e.g., Rootly, PagerDuty, BigPanda)
Experience evaluating and refining AI-generated outputs in a technical or operational context
Proven ability to collaborate with SaaS partners to influence a collective product vision
Comfort operating in an evolving, AI-augmented environment with a focus on continuous learning
Experience with BCP/DR planning or Chaos Engineering
Previous experience implementing large-scale AIOps or 'Self-Healing' infrastructure initiatives

Sr. Engineering Manager, Tooling and Reliability Platforms

Key skills

About this role

Responsibilities:

Requirements: