Provide 24x7x365 health monitoring of Platform Services, infrastructure, and applications using JSP Enterprise System Management tools including SCOM, Stratosphere, Aternity, Atrium, Splunk, vRealize Operations (VRops), and Prism
Open Break/Fix tickets, engage Platform Services teams and external agencies when system issues are observed, and respond to alerts within two (2) hours to prevent service failures when warnings of impending failures are observed
Monitor and participate (when task is related to the Platform Services team) in JSP Network Operations Security Center (JNOSC) dial-in conference bridges and JSP Chat Rooms supporting JNOSC-led troubleshooting efforts during critical incident resolution (historically 3–5 troubleshooting bridges per month)
Act as the incident coordinator for activities between Platform Services teams supporting compute, applications, and infrastructure; provide status to bridge participants and JNOSC on all Priority 1 incidents affecting the Platform Services IT environments
Notify JNOSC and Platform Services Leadership within thirty (30) minutes of any system or service degradation and provide updates every four (4) hours until issue resolution
Maintain central customer lists, problem and solution knowledge bases, work instructions, employee lists, recall lists, and incident management escalation procedures on JSP Portals; report changes to JNOSC
Perform first-tier triage on Windows Server, Red Hat Linux, virtualization, and core service alerts and escalate to SME-level support when needed
Document outages, restoration times, problem resolutions, and customer impact data in the JSP Enterprise ITSM system (currently Remedy)
Support COOP and contingency operations from primary or alternate continuity facilities as needed during exercises or real-world events.
Requirements
Two (2) or more years of hands-on experience as a Systems Administrator or Network Operations Center (NOC) operator in a DoD or large enterprise environment (5+ years for Senior; 7+ years for the SME variant)
Working knowledge of Microsoft Windows Server administration; Red Hat Linux administration required for Linux Watch openings
Hands-on experience with one or more enterprise monitoring tools: SCOM, Splunk, vRops/Prism, Aternity, BMC Atrium, or similar
Experience working in a 24x7 operations environment, including comfort with rotating shifts, on-call rotations, and rapid escalation procedures
Experience with an enterprise ITSM ticketing tool (Remedy/Helix preferred) for incident, change, and work order management
Strong situational awareness, communication, and documentation skills; ability to clearly brief technical issues to leadership during outages
Familiarity with DoD STIGs, IAVM compliance, and Active Directory/DNS/DHCP fundamentals.
Tech Stack
DNS
ITSM
Linux
Splunk
Benefits
Competitive salary based on experience
Comprehensive benefits package including health, dental, vision, and retirement plans