Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Revisiting Network Support for RDMA (1806.08159v1)

Published 21 Jun 2018 in cs.NI

Abstract: The advent of RoCE (RDMA over Converged Ethernet) has led to a significant increase in the use of RDMA in datacenter networks. To achieve good performance, RoCE requires a lossless network which is in turn achieved by enabling Priority Flow Control (PFC) within the network. However, PFC brings with it a host of problems such as head-of-the-line blocking, congestion spreading, and occasional deadlocks. Rather than seek to fix these issues, we instead ask: is PFC fundamentally required to support RDMA over Ethernet? We show that the need for PFC is an artifact of current RoCE NIC designs rather than a fundamental requirement. We propose an improved RoCE NIC (IRN) design that makes a few simple changes to the RoCE NIC for better handling of packet losses. We show that IRN (without PFC) outperforms RoCE (with PFC) by 6-83% for typical network scenarios. Thus not only does IRN eliminate the need for PFC, it improves performance in the process! We further show that the changes that IRN introduces can be implemented with modest overheads of about 3-10% to NIC resources. Based on our results, we argue that research and industry should rethink the current trajectory of network support for RDMA.

Citations (190)

Summary

  • The paper introduces IRN, an improved RDMA NIC that replaces traditional PFC by using selective retransmissions and BDP-based flow control.
  • It employs extensive simulations showing that IRN outperforms conventional RoCE with PFC by 6–83% across diverse network conditions.
  • The study indicates that adopting IRN simplifies network management and reduces hardware overhead while enhancing overall datacenter performance.

Revisiting Network Support for RDMA: Evaluating the Necessity of PFC in RDMA over Ethernet

The paper "Revisiting Network Support for RDMA" by Mittal et al. addresses the fundamental question of whether Priority Flow Control (PFC) is necessary for Remote Direct Memory Access (RDMA) over Ethernet, particularly in the context of RoCE (RDMA over Converged Ethernet) in datacenter environments. This research delineates an alternative to the current norm that utilizes PFC to maintain lossless RDMA operations, presenting insights into improving RDMA NIC (Network Interface Card) designs for enhanced performance.

RDMA offers advantages for datacenter networks by reducing CPU overhead and improving data transfer speeds through direct memory access. However, the prevalent use of RoCE, which is dependent on a lossless network provided by PFC, has introduced several complications such as head-of-the-line blocking and congestion spreading. The authors challenge the necessity of PFC, proposing an alternative NIC design—Improved RoCE NIC (IRN)—that efficiently manages packet losses without relying on a lossless network fabric.

IRN Design and Its Enhancements

The IRN design integrates two fundamental changes to the existing RoCE NIC architecture. First, it implements a selective retransmission mechanism as opposed to the go-back-N loss recovery, thus reducing redundant packet retransmissions. This approach aligns with TCP's selective acknowledgment but is simplified for NIC hardware, emphasizing efficient recovery over complex congestion management.

Second, IRN introduces a basic form of end-to-end flow control termed BDP-FC (Bandwidth-Delay Product Flow Control), which limits the number of in-flight packets to the network's bandwidth-delay product. This mechanism reduces unnecessary queuing and minimizes hardware state requirements by bounding the potential for out-of-order packets.

In extensive simulations, the paper reports that IRN, even without PFC, outperforms traditional RoCE solutions with PFC by 6-83% across various network environments. The significant implication of this finding is the potential to eliminate PFC, thereby simplifying network management and addressing performance challenges inherent in existing protocols.

Comparative Analysis and Practicality

The paper extends its evaluation by juxtaposing IRN against iWARP, another protocol designed for RDMA over IP networks. While iWARP appropriately works under the assumption of an unreliable network and employs an entire TCP stack within the NIC, IRN offers a more efficient, lower complexity alternative by focusing enhancements solely on the NIC's ability to handle packet loss.

The authors further underscore the feasibility of IRN's implementation, citing minimal resource overheads of 3-10% in terms of NIC memory. This efficiency affirms that deploying IRN in place of or alongside traditional RoCE implementations offers a pragmatic pathway forward for industry adoption, aligning with RoCE's design while mitigating its associated complexities when operating over Ethernet.

Implications and Future Directions

The refinements proposed in the IRN design suggest that industries deploying RDMA in vast, dynamic datacenter environments can benefit from reconsidering PFC's role. By embracing loss-tolerant NIC designs, there is the prospect of enhanced, cost-effective deployments that drive greater performance consistency and operational simplicity.

Future explorations may delve into integrating advanced congestion control into IRN, further optimizing its effectiveness in complex, high-traffic networks. Moreover, examining the broader application of IRN principles could extend RDMA's utility beyond datacenter environments, potentially impacting other areas that rely heavily on reliable, high-speed data transfers.

In summary, this paper posits a compelling case for revisiting the architectural and operational assumptions underlying RDMA deployments, offering constructive pathways that challenge existing frameworks and encourage more adaptable, resilient network designs.