DataDirect Networks (DDN) is a global market leader in AI and multi-cloud data management at scale, known for its innovative data intelligence platform. The Sr. Staff Software Engineer will lead the design and development of next-generation storage infrastructure, architecting scalable solutions that support large-scale data-intensive workloads while collaborating closely with cross-functional teams.
Responsibilities:
- Lead the end-to-end design and implementation of large-scale storage systems, including architecture reviews, system design documents, and technical roadmaps
- Design and optimize distributed systems with a focus on high availability, fault tolerance, scalability, and performance
- Drive innovation in storage technologies, including parallel filesystems and/or object storage systems
- Collaborate with product managers, infrastructure teams, and other engineers to define requirements and deliver robust, production-ready solutions
- Mentor senior and junior engineers, conduct code and design reviews, and foster best practices in software engineering
- Troubleshoot complex production issues in distributed environments and implement long-term preventive solutions
- Contribute to open-source projects or internal tools that advance the state of storage and distributed systems
- Stay current with industry trends and evaluate emerging technologies for adoption
Requirements:
- 10+ years of professional software engineering experience, with at least 5 years in a Senior, Staff or principal role focused on backend or infrastructure systems
- Proven expertise in system design, including the ability to create scalable, maintainable architectures for complex distributed systems
- Deep understanding of distributed systems principles (consistency models, CAP theorem, consensus protocols, partitioning, replication, etc.)
- Strong experience building or operating high-availability systems in production environments
- Hands-on experience with storage technologies: Parallel filesystems (e.g., Lustre, GPFS/IBM Spectrum Scale, BeeGFS) and/or Object storage systems (e.g., Ceph, S3-compatible APIs, MinIO, OpenStack Swift)
- Proficiency in one or more management/orchestration frameworks (e.g., Kubernetes, Slurm, Mesos, or similar resource management systems)
- Strong programming skills in Rust, C++, Go, or Java (Rust strongly preferred)
- Excellent communication skills with a track record of influencing technical direction across teams
- Experience leading large-scale projects from conception through deployment and operations
- Experience in High Performance Computing (HPC) environments, including workload schedulers, burst buffers, or scientific computing storage workflows
- Contributions to open-source storage projects (e.g., Ceph, Lustre)
- Familiarity with cloud-native storage solutions and multi-cloud architectures
- Background in performance tuning and benchmarking of storage systems at scale
- Experience with data durability, erasure coding, or tiered storage strategies