Comparative evaluation with NCCL GPU‑Initiated Networking
Determine the comparative communication performance of the stream‑triggered MPI GPU communication API and implementation on HPE Slingshot 11 network interface cards relative to NVIDIA NCCL GPU‑Initiated Networking by enabling both systems on a common interconnect (for example, by porting the MPI stream‑triggered implementation to NVIDIA InfiniBand or porting NCCL GPU‑Initiated Networking to HPE Slingshot) and conducting controlled benchmarks.
References
Finally, NCCL has recently implemented CPU-free communication; we have not been able to compare our performance with this system because our API has not been ported to Infiniband, and NCCL GPU-Initiated Networking has not been ported to HPE Slingshot.
— Co-Design and Evaluation of a CPU-Free MPI GPU Communication Abstraction and Implementation
(2602.15356 - Bridges et al., 17 Feb 2026) in Related Work (Section 6), final paragraph