TEKsystems is seeking an experienced HPC Systems Engineer to support the design, deployment, optimization, and ongoing operations of high-performance computing (HPC) environments. This role involves working with Linux systems and requires expertise in parallel computing and HPC cluster management.
Responsibilities:
- Design, deploy, administer, and support HPC systems running on Linux (RHEL) platforms
- Install, configure, troubleshoot, and performance‑tune HPC clusters supporting engineering and simulation workloads
- Support and optimize parallel computing applications using MPI, OpenMP, CUDA, and related frameworks
- Configure and manage cluster management and job scheduling tools (e.g., Slurm, PBS, LSF)
- Support and troubleshoot high‑speed interconnects, including InfiniBand
- Develop and maintain automation and operational tooling using C, C++, Python, Bash, and Ansible
- Perform root cause analysis and participate in structured incident, problem, and change management processes aligned with ITIL practices
- Work closely with architects, developers, and customers to ensure system stability, performance, and scalability
- Produce clear technical documentation and communicate effectively with both technical and non-technical stakeholders
Requirements:
- Minimum 3 years of professional experience in HPC-focused software or systems engineering
- Strong hands‑on experience administering Linux systems (RHEL preferred)
- Proficiency in one or more of the following: C, C++, Python, Bash, Ansible
- Working knowledge of parallel computing models and frameworks, including MPI, OpenMP, CUDA
- Experience with: HPC cluster deployment and administration
- Job scheduling and resource managers
- High‑performance networking (InfiniBand)
- Demonstrated experience installing, configuring, troubleshooting, and tuning HPC workloads
- Understanding of ITIL concepts related to incident, service, and change management
- Strong analytical and problem‑solving skills
- Excellent communication skills and the ability to adapt in a fast‑paced environment
- Experience supporting engineering simulation or scientific computing workloads
- Familiarity with performance profiling and benchmarking tools
- Prior experience supporting enterprise or customer-facing HPC environments
- Exposure to hybrid or cloud‑adjacent HPC solutions