TEKsystems is seeking an experienced HPC Systems Engineer to support the design, deployment, optimization, and ongoing operations of high-performance computing (HPC) environments. This role involves working with Linux systems and requires expertise in parallel computing and HPC cluster management.

Responsibilities:

Design, deploy, administer, and support HPC systems running on Linux (RHEL) platforms
Install, configure, troubleshoot, and performance‑tune HPC clusters supporting engineering and simulation workloads
Support and optimize parallel computing applications using MPI, OpenMP, CUDA, and related frameworks
Configure and manage cluster management and job scheduling tools (e.g., Slurm, PBS, LSF)
Support and troubleshoot high‑speed interconnects, including InfiniBand
Develop and maintain automation and operational tooling using C, C++, Python, Bash, and Ansible
Perform root cause analysis and participate in structured incident, problem, and change management processes aligned with ITIL practices
Work closely with architects, developers, and customers to ensure system stability, performance, and scalability
Produce clear technical documentation and communicate effectively with both technical and non-technical stakeholders

Requirements:

Minimum 3 years of professional experience in HPC-focused software or systems engineering
Strong hands‑on experience administering Linux systems (RHEL preferred)
Proficiency in one or more of the following: C, C++, Python, Bash, Ansible
Working knowledge of parallel computing models and frameworks, including MPI, OpenMP, CUDA
Experience with: HPC cluster deployment and administration
Job scheduling and resource managers
High‑performance networking (InfiniBand)
Demonstrated experience installing, configuring, troubleshooting, and tuning HPC workloads
Understanding of ITIL concepts related to incident, service, and change management
Strong analytical and problem‑solving skills
Excellent communication skills and the ability to adapt in a fast‑paced environment
Experience supporting engineering simulation or scientific computing workloads
Familiarity with performance profiling and benchmarking tools
Prior experience supporting enterprise or customer-facing HPC environments
Exposure to hybrid or cloud‑adjacent HPC solutions

HPC Systems Engineer (Remote)

Key skills

About this role

Responsibilities:

Requirements: