Vultr is a leading cloud infrastructure company dedicated to making high-performance cloud solutions accessible globally. They are seeking a skilled NetDevOps Engineer to automate and operate RoCE-based Ethernet fabrics, focusing on network engineering, operations, and automation.
Responsibilities:
- Automate deployment and operations of large-scale RDMA (RoCEv2) Ethernet fabrics across Vultr data centers
- Build Ansible and Python-based frameworks to provision, validate, and remediate underlay and overlay networks
- Integrate network automation with Vultr’s source-of-truth systems (NetBox, OpsMill) for intent-driven configuration and validation
- Develop telemetry ingestion and correlation pipelines (gNMI, Prometheus, Kafka, custom collectors) for real-time network health and performance metrics
- Collaborate with platform, orchestration, and product engineering teams to optimize RDMA performance, PFC/ECN behavior, and path symmetry across fabrics
- Implement CI/CD workflows for network configuration changes — validation, pre-checks, and rollbacks
- Investigate complex network behaviors across layers — flow hashing, congestion domains, ECMP, and overlay interactions
- Contribute to the design of next-generation GPU and AI interconnect fabrics, ensuring seamless integration into Vultr’s global network architecture
Requirements:
- Solid understanding of modern data center networking: EVPN-VXLAN, BGP, MLAG, QoS, and traffic engineering
- Deep familiarity with RoCEv2, RDMA transport tuning, ECN/PFC, and lossless Ethernet design
- Strong experience with automation frameworks like Ansible, and languages like Python, Golang, Rust, or PHP
- Comfort working with telemetry and monitoring stacks — Prometheus, Grafana, Loki, ELK, or similar
- Previous experience integrating with NetBox, Nautobot, OpsMill or similar for topology and configuration source-of-truth
- Familiarity with CI/CD systems (GitHub Actions, Jenkins, ArgoCD) for continuous delivery of network automation
- Strong Linux networking background, including namespaces, netlink, and system-level debugging