World Wide Technology is seeking a Senior IBM Storage Scale Engineer to manage the day-to-day operations of their Storage Scale environment. This role involves engineering, operational design, and knowledge transfer to ensure the platform's stability and scalability.
Responsibilities:
- Own day-to-day engineering and operational support of IBM Storage Scale (GPFS) environments
- Monitor cluster health, capacity, performance, and availability
- Troubleshoot complex GPFS issues including quorum, NSDs, metadata, and failure domains
- Partner with adjacent infrastructure teams (compute, network, OS) to resolve cross-stack issues
- Develop production-grade runbooks and standard operating procedures (SOPs), including:
- Upgrade and patching workflows
- Incident response and recovery procedures
- Change management and rollback processes
- Translate tribal knowledge into clear, repeatable documentation
- Continuously improve operational consistency and reliability
- Train and mentor other engineers on Storage Scale operations and best practices
- Lead structured knowledge-transfer sessions and hands-on walkthroughs
- Gradually enable broader teams to safely assist with upgrades, patching, and operational tasks
- Plan and execute Storage Scale upgrades and maintenance activities
- Validate changes in non-production environments prior to rollout
- Identify risks, dependencies, and mitigation strategies related to platform changes
Requirements:
- 8+ years of infrastructure engineering experience
- 3+ years of hands-on IBM Storage Scale (GPFS) experience in production environments
- Strong understanding of Linux systems, storage architectures, and high-availability platforms
- Proven experience supporting and upgrading GPFS clusters
- Ability to troubleshoot complex, multi-node storage issues
- Experience in HPC, research computing, large-scale analytics, or AI workloads
- Prior ownership of storage platform documentation and operational runbooks
- Experience training or mentoring engineers on complex infrastructure platforms
- Familiarity with change management and ITIL-aligned operational practices