Perform proactive documented checkouts and monitoring to avoid service outages or limit their impact. Fault isolation, notifications, and escalations to restore service as necessary.
If an issue cannot be resolved within the L1 team, manage the escalation process to L2 (paging to the correct team, ensuring an on-call engineer responds and check-in with them on resolution progress
Supporting Issue Bridges where multiple teams are needed to resolve an important issue. Setting up the call, paging the required participants, taking timeline notes for Incident Reviews, sending Issue Status updates to Senior Executives
Review Runbooks and pre-deployment application rollout activities that ensure health monitoring, production readiness and understanding of established recovery procedures.
Awareness and support for implementation of scheduled and CAB approved changes.
Support for pro-active maintenance activities such as validating failover recoveries, release deployments and routine activities required to keep the infrastructure in good health
Utilize approved AI‑assisted operational tools to support incident triage, alert interpretation, runbook comprehension, documentation, and communication.
Requirements
Primary degree in Computer Science or related field, with 2-4 years relevant industry experience
A+, CCNA, MCP, Linux Certifications, etc. are valued.
Solid working knowledge of application fundamentals and functions, Windows OS, Linux, HP UX, Big IP Load Balancers.
Good knowledge of operations procedures, application troubleshooting, security principles.
Working knowledge of Web based services that included Apache, Jboss, Tomcat, IIS, and restoration of their services.
Demonstrated ability to apply structured prompting techniques with approved AI‑assisted operational tools to generate, interpret, and summarize incident context, while exercising sound human judgment to validate outputs.
Strong customer service and communications skills.
Comfortable working within a highly collaborative team environment as well as an independent performer is essential.
Weekend and off hours shift work and support is required.
Strong understanding and working experience with ITIL incident and problem management processes and analytical skills.
Excellent English written and verbal skills and ability to succinctly summarize key technical findings and root cause analysis.
Ability to multitask and prioritize with a high attention to detail.
Experience with System Monitoring and Automation is preferred.