Armada is an edge computing startup that provides computing infrastructure to remote areas with limited connectivity. They are seeking a Senior Network Engineer to design, implement, and operate their global network infrastructure, ensuring scalable and secure networks for edge and GPU-enabled deployments.
Responsibilities:
- Design and maintain scalable network architectures for edge, modular, and distributed data center environments
- Define standards for routing, switching, firewalling, segmentation, and redundancy
- Design and operate modern data center network fabrics, including:
- VXLAN / EVPN-based overlay networks
- Underlay routing protocols (e.g., BGP, IS-IS, OSPF)
- Spine–leaf architectures and high-availability designs
- Define overlay / underlay separation, IP addressing strategies, and multi-tenant segmentation models
- Design and support GPU and high-performance compute (HPC) network architectures, including:
- High-bandwidth, low-latency fabrics for GPU clusters
- East–west traffic optimization for distributed compute workloads
- Define network designs to support AI/ML, inference, and GPU-accelerated workloads, accounting for throughput, latency, jitter, and congestion characteristics
- Collaborate with compute and platform teams to align GPU topology, network fabric design, and workload placement
- Own network design documentation including HLDs, LLDs, diagrams, and IP plans
- Evaluate new technologies and vendors to support edge, remote, data center, and GPU-enabled deployments
- Implement and support advanced L3 network services, including routing, firewall policies, VPNs, VXLAN/EVPN fabrics, and GPU-supporting network designs
- Lead complex troubleshooting involving:
- Overlay/underlay interaction issues
- Control-plane instability
- Data-plane forwarding anomalies
- Latency, packet loss, congestion, and convergence issues
- Implement and troubleshoot networks supporting GPU clusters and accelerated workloads, including:
- High-throughput spine–leaf fabrics
- East–west traffic patterns and microburst behavior
- Diagnose and resolve network performance issues impacting GPU workloads, including congestion, packet loss, and latency sensitivity
- Serve as the final escalation point for the NOC and infrastructure teams
- Perform root cause analysis (RCA) and drive permanent corrective actions for recurring or systemic issues
- Define and enforce engineering standards for production network changes
- Design and support hybrid connectivity models including ISP, LTE/5G, satellite, and private links
- Ensure reliable and secure connectivity for remote, industrial, and mission-critical edge sites
- Optimize network performance for latency-sensitive, bandwidth-constrained, and intermittently connected environments
- Design and enforce network segmentation across IT, OT, management, and customer-facing domains
- Implement and maintain firewall policies, VPNs, and secure remote access solutions
- Partner with security teams to align network architectures with security, compliance, and regulatory requirements
- Develop and maintain automation for network provisioning, validation, and configuration consistency
- Improve monitoring, alerting, and observability in partnership with the NOC
- Drive reliability improvements, fault isolation, and MTTR reduction through design, automation, and standards
- Mentor NOC technicians and junior network engineers
- Partner with product, infrastructure, platform, and field teams during deployments and lifecycle planning
- Provide technical leadership during major incidents, change reviews, and post-incident analysis
- Contribute to long-term network strategy and roadmap planning
Requirements:
- US Citizenship
- 7+ years experience in network engineering or infrastructure roles
- Strong expertise in routing, switching, and firewall technologies
- Hands-on experience operating complex L3 networks in production
- Proven ability to troubleshoot overlay, underlay, and performance-related network issues
- Strong documentation and communication skills
- Certifications: CCNP, JNCIP, or equivalent
- Hands-on experience with VXLAN / EVPN and spine–leaf data center architectures
- Experience designing or operating networks for GPU clusters, AI/ML, or high-performance computing environments
- Automation experience (Python, Ansible, Terraform)
- Exposure to cloud networking and hybrid architectures