Build and deliver custom environments with excellent GPU performance for customer workloads
Leverage AI to an extreme level to automate provisioning, alerting and recovery
Provision and configure dedicated Kubernetes clusters tailored to customer requirements
Design and implement overlay networking (VLAN, VXLAN) and routing configurations (ECMP, BGP) and tunnels (strongSwan, IPSEC) for tenant isolation and performance
Build and maintain Linux images
Set up network monitoring and diagnostics for customer environments
Automate the end-to-end lifecycle of customer compute environments: creation, configuration, validation, and teardown
Requirements
5+ years experience with Linux virtualization: KVM/QEMU, libvirt, VFIO device passthrough, hugepages, NUMA, CPU pinning
Strong networking fundamentals: VXLAN, VLAN, ECMP, BGP, ARP, and the ability to debug packet-level issues (tcpdump, Wireshark)
Production experience building and operating Kubernetes clusters on bare metal (MetalLB)
Proficiency with Linux image building and OS provisioning (kickstart, cloud-init, PXE/iPXE)
Proficiency in Python, Bash, Ansible and Terraform
Deep experience with NVIDIA GPUs: drivers, MIG, container runtimes (nvidia-container-toolkit), InfiniBand, RDMA/RoCEv2 and GPUDirect for high-performance AI networking
Excellent communication and ability to drive technical decisions across teams
Self-starter who executes quickly, takes ownership, and constantly seeks improvement