Supports team in coordinating appropriate resources to resolve critical incidents in accordance with service level agreements and operational level agreements.
Supports effective communication during a major system outage, ensuring IT management and the businesses are kept updated until the incident is resolved.
Supports documentation and chronology of events during incident management conference calls.
Updates knowledge management database of major incidents and subsequent responses.
Updates the incident reporting systems with resolution information, liaises with problem management on detection of potential trends, and prepares historical incidents and root cause analysis to identify trends and drive down repeat, service impacting failures.
Supports response to emergency changes in accordance with defined policy and process.
Participates in special projects and performs other duties as assigned.
Manages major IT system incidents through resolution or workaround, minimizing business disruption.
Provides post-incident analysis and improvement recommendations to reduce recurrence.
Ensures adherence to incident management best practices and contributes to continual process improvements.
Monitors incident trends proactively to enhance system reliability and performance.
Requirements
Minimum 5 years related work experience, including at least one year in IT service management.
B.E./B.Tech in Computer Science, Information Technology, or a related discipline, or equivalent combination of training and experience.
Knowledge of ITIL-based incident management processes.
Ability to coordinate cross-functional technical teams during high-impact incidents.
Strong communication and documentation skills for managing incident chronologies and stakeholder updates.
Experience with ITSM platforms (e.g., ServiceNow, BMC Remedy).
Analytical thinking for identifying trends and proposing preventive measures.
Ability to remain composed and effective during high-pressure incidents.
Working knowledge of network, server, and cloud services
Knowledge of CICD practices and processes
Experience working with Kanban or Jira platforms
Experience using Confluence for documentation or collaboration
Familiarity with monitoring and alerting systems such as Splunk or PagerDuty
Knowledge of identity, access, or authentication systems such as Okta, Entra, or SailPoint
Proficiency with AI Agents
Understanding of resiliency or SRE (Site Reliability Engineering) solutions
Stakeholder management capabilities
Tech Stack
Cloud
ITSM
ServiceNow
Splunk
Benefits
Support, coaching and feedback from some of the most engaging colleagues around
Opportunities to develop new skills and progress your career
The freedom and flexibility to handle your role in a way that’s right for you