Dice Question Streamline Icon: https://streamlinehq.com

Scaling SCR multipathing to hundreds of paths on NVIDIA BlueField-3 DPA

Determine whether SCR (the "White-Boxing RDMA" approach implemented on NVIDIA BlueField-3 DPA) can scale its multipath transport to hundreds of distinct paths given the limited L1/L2 cache available on the DPA cores.

Information Square Streamline Icon: https://streamlinehq.com

Background

SCR demonstrates receiver-driven congestion control and multipathing on NVIDIA BlueField-3 SmartNICs using DPA (RISC-V cores). While the prototype shows only two paths, the authors highlight hardware cache limitations and explicitly note uncertainty about scaling to hundreds of paths—an important capability for mitigating flow collisions in ML clusters.

Establishing whether SCR can scale to hundreds of paths would determine the practicality of DPA-based multipath transports for large-scale ML workloads where per-path state and fast decision-making are critical.

References

For multipathing, SCR only demonstrates two paths; given the limited L1/L2 cache in DPA, it is not clear if SCR could scale to hundreds of paths.

An Extensible Software Transport Layer for GPU Networking (2504.17307 - Zhou et al., 24 Apr 2025) in Section 7, Other Related Work