ePlus inc. is seeking a Data Center Infrastructure Engineer with strong hands-on expertise in enterprise infrastructure design, deployment, and delivery. This role will lead the implementation of next-generation data center solutions that support HPC and cloud workloads, ensuring performance, scalability, and operational excellence across hybrid infrastructures.
Responsibilities:
- Design, deploy, and support NVIDIA DGX, HGX, or GPU-based systems within customer environments
- Install and configure GPU platforms, including drivers, firmware, and management tools
- Configure high-speed networking (InfiniBand, Ethernet, VLANs) and validate performance
- Provision compute nodes, configure OS images, and automate deployments
- Implement and manage virtualization platforms (VMware ESXi, vCenter, vSAN, NSX) and hyperconverged infrastructure
- Build and administer containerized platforms using Kubernetes (RKE, OpenShift, EKS, AKS, GKE)
- Integrate storage systems and ensure high-performance data access for workloads
- Implement infrastructure automation and assist with configuration management
- Collaborate with cross-functional teams — networking, DevOps, storage, and application owners — to ensure smooth project delivery
- Troubleshoot and optimize system performance across compute, network, and storage layers
- Provide technical leadership and documentation for customer deployments and internal delivery teams
Requirements:
- 6+ years of experience in data center architecture, infrastructure delivery, or systems engineering roles
- Working experience with GPU platforms and GPU Drivers
- Hands-on experience with Linux systems, virtualization, and Kubernetes
- Strong networking knowledge including InfiniBand, Ethernet, VLANs, and RDMA
- Experience with automation tools and scripting (e.g., Ansible, Terraform, Bash, Python)
- Understanding of container orchestration and distributed workloads across multiple distributions
- Excellent troubleshooting, documentation, and customer-facing communication skills
- Ability to deliver complex projects independently, on time, and in coordination with remote teams