Cornelis Networks delivers the world’s highest performance scale-out networking solutions for AI and HPC datacenters. They are seeking a highly experienced Principal Software Engineer to lead the design, development, and upstream enablement of their AI and HPC communication middleware stack.

Responsibilities:

Lead design and implementation enabling and optimizing HPC middleware (MPI and SHMEM) and AI middleware CCL stacks (e.g., NCCL/RCCL and related collective communication libraries)
Deliver performance-critical communication paths including low-latency small and medium message transfers, bulk SDMA data movement, GPU-Direct and IPC communication, and collective acceleration
Design and tune collective communication algorithms (latency-optimized and bandwidth-optimized), including GPU-aware collectives
Integrate middleware with underlying transports and provider layers such as libfabric/OFI, UCX, and verbs-style interfaces to achieve performance, portability, and maintainability
Implement and optimize memory registration strategies, progress and execution models, completion semantics, multi-rail communication behavior, and GPU memory handling
Drive upstream contributions across MPI/SHMEM projects, CCL ecosystems, and related components with a focus on upstreamable design and long-term maintainability
Represent Cornelis Networks in open-source communities through technical reviews, design discussions, and sustained technical leadership
Implement and prototype Ultra Ethernet capabilities supporting MPI/SHMEM and AI collective communication use cases
Collaborate with ecosystem partners to validate deployment models and performance scaling on customer-relevant configurations
Work closely with kernel, driver, and switch teams to deliver end-to-end performance aligned with the Cornelis product roadmap
Participate in architecture reviews, performance tuning, scaling validation, and multi-layer root-cause investigations
Analyze performance traces and triage advanced customer issues, translating findings into robust fixes and upstream improvements
Publish internal and external best practices, including tuning guidance, reference configurations, and debugging methodologies
Mentor senior engineers and promote best practices for design, testing, documentation, and code quality
Help define the long-term middleware technical roadmap aligned with product evolution and customer needs

Requirements:

12+ years of experience in high-performance systems programming in C/C++ on Linux
Hands-on experience with MPI internals (Open MPI, MPICH, MVAPICH) and/or SHMEM implementations
Experience implementing or optimizing collective communications for HPC and/or AI workloads, including NCCL/RCCL (CUDA/ROCm) or related CCL stacks
Demonstrated ability to design low-latency/high-throughput communication paths and diagnose performance issues using profiling and tracing tools
Working knowledge of transport and integration layers such as OFI/libfabric, UCX, and verbs-style networking concepts
Strong understanding of RDMA and performance tuning
Proven open-source contribution track record
Demonstrated technical leadership in complex HPC or AI system software
Experience developing or maintaining libfabric providers
Familiarity with Ultra Ethernet (UEC/UET) specifications
Experience with RoCEv2, congestion control, or Ethernet-based RDMA deployments
Experience with cluster-scale benchmarking, profiling, and optimization
Background with Omni-Path/OPX or other Ethernet-based HPC fabrics

Principal Software Engineer – AI/HPC Middleware

Key skills

About this role

Responsibilities:

Requirements: