Vero is an exciting AI infrastructure startup that collaborates closely with NVIDIA and other key organizations in the field. The Senior Storage Engineer will operate, optimize, and scale distributed storage systems to support advanced AI workloads, ensuring performance and reliability for large-scale GPU operations.

Responsibilities:

Operate and support production storage platforms powering large-scale AI workloads
Maintain performance, stability, and reliability across customer environments
Monitor and tune storage systems to ensure predictable throughput and low latency
Troubleshoot end-to-end I/O issues across GPU clients, RDMA networks (InfiniBand or RoCE), and storage infrastructure
Plan and execute upgrades, expansions, and maintenance with minimal disruption
Support customer onboarding, including storage configuration, namespaces, and access controls
Run performance validation and benchmarking
Own incidents, lead root cause analysis, and improve reliability through automation and documentation

Requirements:

Strong Linux systems experience operating storage infrastructure in production environments
Hands-on experience with high-performance or distributed storage systems supporting large-scale AI and HPC clusters
Deep understanding of storage architectures including parallel file systems, file, object, and block storage (VAST, DDN, Weka, Lustre)
Experience troubleshooting end-to-end I/O performance across clients, RDMA networks (InfiniBand or RoCE), and storage systems
Experience analyzing and optimizing storage performance, including benchmarking, reliability, and data protection concepts
ETL and integrations supporting AI/ML workloads

Senior Storage Engineer (AI & GPU)

Key skills

About this role

Responsibilities:

Requirements: