Nasscomm is seeking a highly experienced Sr DevOps Engineer – Storage Platforms to build, automate and operate large-scale Software Defined Storage and Kubernetes platforms in a private cloud environment. The role focuses on Storage Engineering with Infrastructure-as-Code and GitOps practices while ensuring scalability, resilience, and performance.
Responsibilities:
- Deploy, automate, and operate large-scale Software Defined Storage architectures across private and public cloud regions within ITIL methodology
- Deploy and support enterprise storage platforms (Pure Storage, HPE, NetApp) and SDS solutions (Ceph, Longhorn)
- Integrate self-service storage workflows for Kubernetes CSI and OpenStack consumers (VM and Baremetal)
- Implement and manage backup solutions (preferably Rubik)
- Build and maintain Infrastructure-as-Code for storage platforms using Ansible, Terraform, Helm and Git, with Python/Bash automation
- Implement CI/CD pipelines for infrastructure updates, patching, upgrades, testing, and rollback
- Implement and improve monitoring, alerting, and observability for storage systems (capacity, latency, IOPS, recovery health) using GitOps and tools such as Prometheus, Loki, and Grafana
- Perform deep troubleshooting across storage, Kubernetes, hypervisors, networking, and Linux systems
- Develop and maintain technical documentation, architecture diagrams, operational procedures, and runbooks
- Participate in on-call rotations, incident response, and root cause analysis
- Collaborate globally on change management, documentation, and operational best practices
Requirements:
- 6+ years of experience managing enterprise storage and Kubernetes platforms on Linux
- Strong hands-on experience with SDS solutions (Ceph, Longhorn) and storage migrations from legacy systems
- Expertise with block, file, and object storage, including Fibre Channel (Cisco MDS) and IP-based protocols (NVMe-oF or iSCSI farbics)
- Expert knowledge of Kubernetes and Linux systems (Ubuntu, RHEL/CentOS)
- Proficiency with Infrastructure-as-Code (IaC) (Ansible, Terraform) for provisioning storage and backup schedules
- Expertise in backup technologies (preferably Rubik)
- Strong scripting skills in Python and Bash
- Experience operating 24x7 mission-critical production environments
- Hands-on experience with KVM hypervisors (Suse Harvester, OpenStack)
- Strong written and verbal communication skills
- Proficiency with Git, CI/CD pipelines, and automated testing frameworks
- Ability to write technical documentation and contribute to community wikis or knowledge bases
- Bachelor's degree in computer science or equivalent professional experience
- Golang (GO) a plus