Receiving escalations from on-site Level 1–3 teams, triaging root cause hands-on, and owning the technical path to resolution across the full infrastructure stack — spanning bare metal, hypervisor, networking, and application layers — using direct access to production environments, log analysis, and service-level debugging
Engaging directly with HPE engineers: writing structured defect reports with reproduction steps, attending engineering triage calls, and validating patches in customer environments before broader rollout
Translating field-observed symptoms into actionable technical requirements for product and engineering teams — filtering noise from vague customer complaints into precise, reproducible problem statements
Training and enabling a managed services team field engineers who are new to the PCE platform — producing runbooks, leading technical walkthroughs, and building their operational confidence
Monitoring customer environments; triaging alerts and contributing to dashboard and observability improvements
Conducting on-site work at customer locations
Hardening environments to DISA STIG requirements and supporting audit readiness activities
Producing customer-facing incident summaries and internal knowledge base articles after each major resolution
Requirements
U.S. Citizenship (required without exception)
Active Secret clearance, or demonstrated ability to obtain one (U.S. citizen with clean background)
Bachelor’s degree in Computer Science, Information Technology, Engineering, or equivalent hands-on experience
10+ years of hands-on experience in infrastructure engineering, systems administration, or technical support — with demonstrated depth in at least three of the following domains:
VMware (ESXi, vCenter, vSAN, NSX) and/or KVM/libvirt — both are relevant; KVM experience is highly desired
Kubernetes and containerized workloads (hands-on cluster operations, not solely architectural review)
Linux systems administration and live troubleshooting (Red Hat, Ubuntu, or similar)
Proven, hands-on troubleshooting experience across the full infrastructure stack — direct access to production environments, log analysis, and live system debugging spanning bare metal, hypervisor, networking, and application layers are daily requirements of this role.
Proven experience interfacing between field/support teams and engineering or product organizations, including writing structured defect reports
Strong written communication skills: able to produce engineering-quality defect reports and executive-readable incident summaries
Experience with enterprise monitoring/observability platforms (OpsRamp, Prometheus, Grafana, or equivalent) and ITSM ticketing systems (ServiceNow or equivalent).