Work closely with customer stakeholders, scientists, and IT professionals to deliver Compute at Scale
Develop, evolve, and administer HPC platforms along with support for Scientific applications, workflows, and other related infrastructure both on-prem and Cloud hosted
Drive architecture, roadmaps, and execution of projects to establish and operate IT infrastructure best practices for customers
Full stack support
design and evolution of platforms, application administration, supporting customer workflows, profiling and performance tuning
Monitoring and maintenance of scoped systems, platform and systems administration, troubleshooting hardware, software, and networking related issues
Solution architecting and hands-on engineering (on-prem + Cloud)
Documentation
Collaborating with cross-discipline team members and customers
Supporting internal and customer Architecture and Design efforts
Supporting customers with their workflow pipelines (advisory and hands-on)
Comprehensively documenting new and existing computational assets
Maintaining the flexibility to pivot as engagement scopes may evolve
Support for AWS & GCP Cloud applications, migrations, and modernization
CloudOps / IaC for on-going platform management
Setup and configuration of AWS & GCP Cloud infrastructure for new platform builds
Ensuring system compliance with company security standards and applicable regulatory requirements
Transition support for modernized services to operational teams
Provide engineering level troubleshooting and services restoration for operational issues as they arise on supported platforms
Provide training/mentorship for junior level team members
Escalation point on multiple engagements to ensure resolution
Requirements
A bachelor’s degree or master’s degree in Computer Science or related field
5 + years of experience administering HPC clusters and systems
Experience with SLURM and Grid Engine scheduling software preferred
5 + years of professional experience in Solution Architecture or Cloud Infrastructure Deployment and support
5+ years professional experience developing or administering compute solutions for Scientific / Research IT domains, Life Sciences being preferred
Experience with POSIT products (Package Manager, Connect, Workbench) either in an end-user or administrator capacity
Experience developing scientific workflows on HPC systems using Nextflow
Extensive command-line system administration experience: User and group management
Advanced knowledge of Active Directory, DNS, DHCP, LDAP, NFS, SMB
Building applications from source code, installing, maintaining, and troubleshooting application-level Linux and scientific software in line with industry best practices
Installation of Linux operating system and fine tuning
Familiarity with leveraging and maintaining Linux package management systems
Intermediate OS level networking knowledge
Experience using with scripting tools, automation tools, and configuration management tools
Ansible, Terraform and Cloud Formation experience preferred
Experience administering and integrating Scientific / Research applications.
Strong time-management skills; able to complete projects in a timely manner, plan and prioritize tasks while keeping leadership and stakeholders updated regularly on status
Excellent communication skills, including preparation of written documentation for IT colleagues and end users
Proactive thinking skills to identify potential issues and solution options prior to incidents occurring
Extreme attention to detail is needed to interface with multi different clients simultaneously
Ability to understand and analyze complex technical problems and situations
Candidates must be a passionate engineer with a strong vision and a desire to stay on top of trends in the Scientific Computing sector.
Ability to work independently or with a team
Ability to take a project from start to finish with minimal supervision
Tech Stack
Ansible
AWS
Cloud
DNS
Google Cloud Platform
Linux
NFS
Terraform
Benefits
Comprehensive health and wellness benefits, including Medical, Dental, and Vision Insurance
Company-provided Life and Long-Term Disability Insurance
Company-sponsored 401(k) Plan
Company-provided continuing education benefit
Team-focused culture and unlimited opportunity for advancement