Vero is an exciting AI infrastructure startup working in partnership with NVIDIA and other key organizations shaping the future of data centers and AI infrastructure. As a Senior Storage Engineer, you will operate, optimize, and scale distributed storage systems for advanced AI infrastructure, ensuring performance and reliability for large-scale GPU workloads.

Responsibilities:

Operate and support production storage platforms powering large-scale AI workloads, including ETL
Maintain performance, stability, and reliability across customer environments
Monitor and tune storage systems to ensure predictable throughput and low latency
Troubleshoot end-to-end I/O issues across GPU clients, RDMA networks (InfiniBand or RoCE), and storage infrastructure
Plan and execute upgrades, expansions, and maintenance with minimal disruption
Support customer onboarding, including storage configuration, namespaces, and access controls
Run performance validation and benchmarking
Own incidents, lead root cause analysis, and improve reliability through automation and documentation

Requirements:

Strong Linux systems experience operating storage infrastructure in production environments
Hands-on experience with high-performance or distributed storage systems supporting large-scale AI or HPC clusters
Deep understanding of storage architectures including parallel file systems, file, object, and block storage (e.g. Lustre, VAST, DDN)
Experience troubleshooting end-to-end I/O performance across clients, RDMA networks (InfiniBand or RoCE), and storage systems
Experience analyzing and optimizing storage performance, including benchmarking, reliability, and data protection concepts
ETL and integrations supporting AI/ML workloads

Senior Storage Engineer (InfiniBand & HPC)

Key skills

About this role

Responsibilities:

Requirements: