Leads an enterprise-wide Reactive Problem Management function covering the full application portfolio; accountable for consistent execution, quality, and outcomes of post-incident problem investigations.
Ensures problems reach true root cause (not symptoms) using structured analysis and evidence-based conclusions; sets quality standards for problem records, timelines, and closure criteria.
Creates, assigns, and rigorously tracks corrective and preventive Action Items, driving cross-team accountability through completion and validating effectiveness in reducing recurrence and improving resiliency.
Partners with application, platform, infrastructure, network, security, and operations teams after service restoration to coordinate investigations, remove blockers, and align on remediation plans.
Drives measurable improvement in resiliency, reduction in customer disruption, and reduced MTTR through elimination of repeat incidents, improved detection/diagnostics, and better operational readiness.
Uses impact, recurrence, and risk to influence engineering and platform backlogs, ensuring the highest-value remediation work is prioritized and delivered without direct change/governance ownership.
Produces clear, executive-ready summaries of root cause, contributing factors, risk exposure, remediation progress, and expected impact; escalates when commitments or timelines are at risk.
Manages and develops a team of 7–10 Problem Managers/analysts; coaches structured problem-solving, stakeholder management, and crisp documentation; sets expectations for pace, rigor, and accountability.
Establishes and maintains standard operating practices for intake, severity/priority, aging management, escalation, and closure; ensures consistency across business units and application teams.
Identifies systemic patterns across incidents and problems, recommends enterprise-level resiliency improvements, and drives preventative initiatives based on trend and impact analysis.
Requirements
5+ years of related experience.
Bachelor’s degree (BS/BA) in Computer Science preferred.
Supervisor: Yes
Benefits
Medical/Dental/Vision coverage
401(k) plan
Tuition reimbursement program
Paid Time Off and Holidays (based on date of hire, at least 23 days of vacation each year and 9 company-designated holidays)
Paid Parental Leave
Paid Caregiver Leave
Additional sick leave beyond what state and local law require may be available but is unprotected