Develop Network solutions in collaboration with external solution providers, VAR, and Supermicro internal teams.
Design low-latency, high-throughput AI network fabrics (scale-out, scale up, converged) to support GPU traffic patterns for training and distributed inferencing.
Architect RAIL, Clos-based multiplane leaf-spine topologies using 100G/400G/800G infrastructure across various networking platforms.
Design multitenant BGP, EVPN, VXLAN, and routing designs for both scale out, internal cluster traffic and external ingress/egress paths to the internet and cloud.
Define and drive networking strategy aligned with business growth, automation goals, and AI infrastructure scalability.
Develop infrastructure-as-code workflows using Ansible, Terraform, and Python to automate provisioning, configuration, and monitoring.
Implement telemetry pipelines and traffic analytics for proactive visibility, capacity planning, and SLA adherence.
Develop high-level and low-level network solution design documentation, playbooks, and operational standards to support scalable deployments and troubleshooting.
Evaluate emerging technologies from NVIDIA, AMD, hyperscalers, and connectivity providers to influence roadmap decisions.
Work closely with platform, hardware, facilities, and security teams to deliver integrated network solutions and infrastructure for AI/ML workloads.
Requirements
15–20 years in network engineering or architecture roles, including large-scale data center or AI infrastructure environments
Bachelor’s degree in computer science, Electrical Engineering, or equivalent experience
Strong business acumen: able to balance performance, cost, and scalability in architecture decisions.
Customer-Focused Mindset: Experience working closely with customers to design solutions that meet their unique needs and resolving complex technical challenges.
Strong hands-on experience with Open Networking switching platforms & SONiC.
Proven track record designing data center fabrics using BGP, OSPF, EVPN-VXLAN, and overlay networks
Expertise with InfiniBand, RoCEv2, and RDMA-based networking in GPU environments
Proficient in network automation using Ansible, Terraform, Python, and Git-based workflows
Ability to define business-aligned network strategy roadmaps for scalable AI infrastructure
Experience leading HLD/LLD design efforts and technical documentation
Strong understanding of telemetry, observability, and proactive network health management.
Tech Stack
Ansible
Cloud
Python
Switching
Terraform
Benefits
Comprehensive benefits package including health insurance