RedLine Performance Solutions is a leader in High Performance Computing solutions engineering services, dedicated to maintaining high standards for new hires. The Linux HPC Engineer will support the installation and operational maintenance of an HPC cluster, collaborate with a team, and contribute to various infrastructure initiatives while providing technical expertise across multiple projects.
Responsibilities:
- Work on a small team of HPC Systems Administrators responsible for the installation and operational support of an HPC cluster located in Phoenix, Arizona
- Participate in the evolution and maintenance of the technical infrastructure
- Support the on-site HPC environment
- Shift priorities, support parallel efforts, and provide technical expertise across multiple projects, including deployments, upgrades, troubleshooting, and documentation
- Collaborate with cross-functional engineering teams and participate in planned maintenance windows or special projects to meet organizational commitments
- Travel to different customer sites expected to be a maximum of 25% of the time
Requirements:
- 5 or more years of Linux systems administration, preferably in a Red Hat and/or Rocky environment
- Strong knowledge of TCP/IP networking
- HPC system administration experience (e.g., parallel file systems, cluster management, archival systems)
- Strong experience in Bash, Perl, and Python scripting in a version-controlled environment using Git
- Strong verbal and written communication skills, with the ability to coordinate between multiple team members in remote locations between several disparate projects
- Strong organizational skills
- US citizenship is a mandatory requirement for this position
- Experienced with system engineering in addition to system administration
- Cloud administration (e.g. Azure, GCP, AWS)
- Experience with deploying and supporting computational models and simulations in HPC infrastructure (e.g., on-premise and cloud, with containers)
- Knowledge and understanding of application hosting, with experience using Cloud Services in a Commercial Infrastructure as a Service (IAAS) or Platform as Service (PAAS) environment
- Red Hat Certification (e.g., RHCSA, RHCE)
- Server automation experience (e.g., Puppet, Foreman, Ansible)
- Experience with job scheduling software (e.g., Slurm or Moab)
- Experience with cluster automation tools (e.g., xCAT, HPCM, or Bright Cluster Manager)
- Familiarity with a wide range of server and networking hardware (e.g., HPE, SuperMicro, NetGate, Juniper, etc.)
- Applications such as Atlassian Confluence, Gitlab, or Mediawiki