RedLine Performance Solutions (RedLine) has been in the HPC solutions engineering services business for over 26 years and is consistently determined to keep the "bar of excellence" quite high for new hires. This enables RedLine to accomplish what other firms cannot and promotes a high level of staff retention. We offer services ranging from full life cycle HPC systems engineering to remote managed services to HPC program analysis. We are looking for an HPC System Administrator to join us.
The HPC System Administrator will provide operational support for HPC clusters located in Dayton, OH. Operations run 24x7 and therefore there will be a rotational on-call requirement. The HPC systems administrator will engage with the customer and participate in the evolution and maintenance of the technical infrastructure in addition to operationally supporting the on-site HPC environment. The administrator will be responsible for relaying these insights to the RedLine Program Manager and working together to translate customer needs into actionable project tasks.
IMPORTANT:The position is at the customer site in Dayton, OH. The administrator functions as the lead point of contact for day-to-day operations and real-time problem resolution. As such, remote work is not viable for this role but relocation may be considered.
Active DoD Top Secret security clearance and relevant technical certifications (e.g., Linux+, Security+) are mandatory requirements for this position.This full-time (W-2) position offers a full benefits package including paid time off, 401k match, and health care benefits.
Required Skills:- 7 or more years of Linux systems administration, preferably in a Red Hat and/or Rocky environment
- Strong knowledge of TCP/IP networking.
- 5 or more years of HPC cluster system administration experience, preferably with Dell clusters
- Strong experience in Bash, Perl, and Python scripting in a version-controlled environment using Git
- Experience with job scheduling software (e.g., Slurm, PBS)
- Experience with cluster automation tools (e.g., xCAT, HPCM, Bright Cluster Manager)
- Experience with parallel filesystems (e.g., Lustre)
- Experience with high-speed interconnects (e.g., InfiniBand)
- Strong verbal and written communication skills, with the ability to coordinate between multiple team members in remote locations between several disparate projects
- Strong organizational skills
Preferred Skills/Experience:- Experienced with system engineering in addition to system administration
- Red Hat Certification (e.g., RHCSA, RHCE)
- Server automation experience (e.g., Puppet, Foreman, Ansible)
- Experience with MPI technologies.
- Experience with Warewulf cluster management and provisioning.
- Experience with Weka parallel file systems.
- Optimization experience with GPU based HPC clusters.