Lead design and implementation enabling and optimizing HPC middleware (MPI and SHMEM) and AI middleware CCL stacks (e.g., NCCL/RCCL and related collective communication libraries)
Deliver performance-critical communication paths including low-latency small and medium message transfers, bulk SDMA data movement, GPU-Direct and IPC communication, and collective acceleration
Design and tune collective communication algorithms (latency-optimized and bandwidth-optimized), including GPU-aware collectives
Integrate middleware with underlying transports and provider layers such as libfabric/OFI, UCX, and verbs-style interfaces to achieve performance, portability, and maintainability
Implement and optimize memory registration strategies, progress and execution models, completion semantics, multi-rail communication behavior, and GPU memory handling
Drive upstream contributions across MPI/SHMEM projects, CCL ecosystems, and related components with a focus on upstreamable design and long-term maintainability
Represent Cornelis Networks in open-source communities through technical reviews, design discussions, and sustained technical leadership
Implement and prototype Ultra Ethernet capabilities supporting MPI/SHMEM and AI collective communication use cases
Collaborate with ecosystem partners to validate deployment models and performance scaling on customer-relevant configurations
Work closely with kernel, driver, and switch teams to deliver end-to-end performance aligned with the Cornelis product roadmap
Participate in architecture reviews, performance tuning, scaling validation, and multi-layer root-cause investigations
Analyze performance traces and triage advanced customer issues, translating findings into robust fixes and upstream improvements
Publish internal and external best practices, including tuning guidance, reference configurations, and debugging methodologies
Mentor senior engineers and promote best practices for design, testing, documentation, and code quality
Help define the long-term middleware technical roadmap aligned with product evolution and customer needs
Requirements
12+ years of experience in high-performance systems programming in C/C++ on Linux