- The paper introduces IRN, an improved RDMA NIC that replaces traditional PFC by using selective retransmissions and BDP-based flow control.
- It employs extensive simulations showing that IRN outperforms conventional RoCE with PFC by 6–83% across diverse network conditions.
- The study indicates that adopting IRN simplifies network management and reduces hardware overhead while enhancing overall datacenter performance.
Revisiting Network Support for RDMA: Evaluating the Necessity of PFC in RDMA over Ethernet
The paper "Revisiting Network Support for RDMA" by Mittal et al. addresses the fundamental question of whether Priority Flow Control (PFC) is necessary for Remote Direct Memory Access (RDMA) over Ethernet, particularly in the context of RoCE (RDMA over Converged Ethernet) in datacenter environments. This research delineates an alternative to the current norm that utilizes PFC to maintain lossless RDMA operations, presenting insights into improving RDMA NIC (Network Interface Card) designs for enhanced performance.
RDMA offers advantages for datacenter networks by reducing CPU overhead and improving data transfer speeds through direct memory access. However, the prevalent use of RoCE, which is dependent on a lossless network provided by PFC, has introduced several complications such as head-of-the-line blocking and congestion spreading. The authors challenge the necessity of PFC, proposing an alternative NIC design—Improved RoCE NIC (IRN)—that efficiently manages packet losses without relying on a lossless network fabric.
IRN Design and Its Enhancements
The IRN design integrates two fundamental changes to the existing RoCE NIC architecture. First, it implements a selective retransmission mechanism as opposed to the go-back-N loss recovery, thus reducing redundant packet retransmissions. This approach aligns with TCP's selective acknowledgment but is simplified for NIC hardware, emphasizing efficient recovery over complex congestion management.
Second, IRN introduces a basic form of end-to-end flow control termed BDP-FC (Bandwidth-Delay Product Flow Control), which limits the number of in-flight packets to the network's bandwidth-delay product. This mechanism reduces unnecessary queuing and minimizes hardware state requirements by bounding the potential for out-of-order packets.
In extensive simulations, the paper reports that IRN, even without PFC, outperforms traditional RoCE solutions with PFC by 6-83% across various network environments. The significant implication of this finding is the potential to eliminate PFC, thereby simplifying network management and addressing performance challenges inherent in existing protocols.
Comparative Analysis and Practicality
The paper extends its evaluation by juxtaposing IRN against iWARP, another protocol designed for RDMA over IP networks. While iWARP appropriately works under the assumption of an unreliable network and employs an entire TCP stack within the NIC, IRN offers a more efficient, lower complexity alternative by focusing enhancements solely on the NIC's ability to handle packet loss.
The authors further underscore the feasibility of IRN's implementation, citing minimal resource overheads of 3-10% in terms of NIC memory. This efficiency affirms that deploying IRN in place of or alongside traditional RoCE implementations offers a pragmatic pathway forward for industry adoption, aligning with RoCE's design while mitigating its associated complexities when operating over Ethernet.
Implications and Future Directions
The refinements proposed in the IRN design suggest that industries deploying RDMA in vast, dynamic datacenter environments can benefit from reconsidering PFC's role. By embracing loss-tolerant NIC designs, there is the prospect of enhanced, cost-effective deployments that drive greater performance consistency and operational simplicity.
Future explorations may delve into integrating advanced congestion control into IRN, further optimizing its effectiveness in complex, high-traffic networks. Moreover, examining the broader application of IRN principles could extend RDMA's utility beyond datacenter environments, potentially impacting other areas that rely heavily on reliable, high-speed data transfers.
In summary, this paper posits a compelling case for revisiting the architectural and operational assumptions underlying RDMA deployments, offering constructive pathways that challenge existing frameworks and encourage more adaptable, resilient network designs.