Papers
Topics
Authors
Recent
2000 character limit reached

Data Center Quantized Congestion Notification (DCQCN)

Updated 14 November 2025
  • DCQCN is a scalable, closed-loop congestion control protocol that integrates switch-based ECN with sender rate adaptation to manage congestion in high-performance data centers.
  • The protocol operates over three key loci—Congestion Point, Notification Point, and Reaction Point—to provide effective injection throttling and maintain high throughput.
  • Refinements such as ECP, ENP, ERP, and ICI enhance victim flow handling and reduce control overhead, enabling faster recovery and improved network performance.

Data Center Quantized Congestion Notification (DCQCN) is a scalable, closed-loop congestion control protocol designed for lossless transport and traffic engineering in high-performance data center and supercomputer networks. DCQCN integrates switch-based Explicit Congestion Notification (ECN) with end-host rate adaptation to deliver effective injection throttling and high utilization under diverse workloads, including incast and in-network congestion. DCQCN has served as a foundational mechanism for congestion management on RDMA over Converged Ethernet (RoCEv2), InfiniBand, and similar technologies, and has undergone significant refinement to address emerging workload patterns and sharper performance demands over the past decade (Olmedilla et al., 7 Nov 2025, Merino et al., 6 Nov 2025).

1. Architectural Overview and Mechanistic Pipeline

DCQCN operates across three principal loci in the network data path: the Congestion Point (CP) at the switch egress, the Notification Point (NP) at the receiver, and the Reaction Point (RP) at the sender. This structure underpins a feedback loop predicated on in-network congestion detection, notification delivery, and source-side rate control.

  • Congestion Point (CP): Each switch egress maintains a buffer (queue), and congestion is inferred based on occupancy thresholds, typically labeled KminK_{\min} and KmaxK_{\max}.
  • Notification Point (NP): The receiver transforms ECN-marked data packets (i.e., those with the Congestion Experienced bit set) into Congestion Notification Packets (CNPs) that are returned to the sender.
  • Reaction Point (RP): The sender, upon receiving a CNP, initiates injection throttling through multiplicative rate reduction and an additive increase mechanism between marks.

This division enables scalable, per-flow congestion management and is the basis for both the classic and refined DCQCN designs.

2. Core DCQCN Algorithms and Mathematical Model

Switch Buffer Monitoring and Marking

At the CP, packet marking is governed by instantaneous queue length (qq) relative to configured thresholds:

$P(q)= \begin{cases} 0, & q\leq K_{\min}\[6pt] P_{\max}\frac{q-K_{\min}}{K_{\max}-K_{\min}}, & K_{\min}<q<K_{\max}\[8pt] 1, & q\geq K_{\max} \end{cases}$

where Pmax1P_{\max}\leq 1; in the widely used step-marking, Kmin=Kmax=VK_{\min}=K_{\max}=V yields a step function.

Sender-Side Closed-Loop Control

Each sender maintains a rate variable R(t)R(t) and a congestion-severity estimate α(t)\alpha(t):

  • CNP-Triggered Rate Reduction:

R(t+)=(1β)R(t)R(t^+) = (1-\beta) R(t^-)

where β(0,1)\beta \in (0,1) (classic: typical β=0.5\beta = 0.5–0.8).

  • Severity Updating:

α(t+)  =  (1g)α(t)+gg=1/256\alpha(t^+)\;=\;(1-g)\,\alpha(t^-)+g\qquad g = 1/256

  • AIMD-Style Additive Increase:

dRdt=1αRTT\frac{dR}{dt} = \frac{1 - \alpha}{\mathrm{RTT}}

Fluid Model and Stability

The system's closed-loop dynamics for queue q(t)q(t), marking probability M(t)M(t), and sender rate R(t)R(t) may be captured as:

dq(t)dt=R(t)C dR(t)dt=1α(t)RTTβR(t)dM(t)dt\begin{aligned} \frac{dq(t)}{dt} &= R(t) - C \ \frac{dR(t)}{dt} &= \frac{1-\alpha(t)}{\mathrm{RTT}} - \beta R(t) \frac{dM(t)}{dt} \end{aligned}

Linearization yields a first-order transfer function:

H(s)=Gτs+1H(s) = \frac{G}{\tau s + 1}

with gain GG and time constant τ\tau controlled by protocol parameters. Proper parameter selection (notably β\beta) is essential to ensure stability and minimization of rate oscillations or overshoot.

3. Shortcomings of Classic DCQCN and Need for Refinement

Empirical and analytical studies have identified key limitations in the standard DCQCN scheme:

  • Victim Packet Marking: All packets enqueued past threshold are equally marked regardless of their actual contribution to congestion. This leads to non-congesting ("victim") flows suffering unnecessary rate reduction.
  • Excessive Control Feedback Traffic: At high mark rates, the NP generates a CNP per marked packet, resulting in significant control-plane overhead.
  • Context-Free Rate Reduction: The sender's reaction is agnostic to the severity or duration of congestion, causing broad and sometimes excessive throttling.
  • Responsiveness to Short-Lived/Microburst Congestion: The closed-loop reaction operates at a granularity dictated by feedback timing, rendering it slow to react to microbursts or rapidly changing network states (Olmedilla et al., 7 Nov 2025, Merino et al., 6 Nov 2025).

4. Refined Congestion Detection, Notification, and Throttling

Subsequent research (notably Olmedilla et al.) proposes systematic enhancements:

4.1 Enhanced Congestion Point (ECP)

Instead of marking all packets above threshold, ECP considers per-flow queueing delay:

δqueue=tdeqtenq\delta_{\text{queue}} = t_{\text{deq}} - t_{\text{enq}}

Packets are marked only if δqueueθdelay\delta_{\text{queue}} \geq \theta_{\text{delay}}, discriminating between genuine congestion contributors (long-held) and victims (short-wait). θdelay\theta_{\text{delay}} is experimentally tuned (order of microseconds).

4.2 Enhanced Notification Point (ENP)

ENP aggregates ECN marks by batching within window TbatchT_{\text{batch}} (e.g., 10μs10\,\mu s), emitting a single CNP per flow per interval. If kk marks are detected, the CNP contains the count, allowing source-side scaling:

R(t+)=(1βmin(1,k/Kmax))R(t)R(t^+) = \left(1 - \beta \cdot \min\left(1, k/K_{\max}\right)\right) R(t^-)

This reduces CNP overhead by up to 80% in heavy-marking scenarios and carries coarse congestion information.

4.3 Enhanced Reaction Point (ERP)

ERP dynamically sets the backoff parameter based on kk:

βeff=β0+(1β0)kKmax\beta_{\text{eff}} = \beta_0 + (1 - \beta_0) \frac{k}{K_{\max}}

α(t+)=(1g)α(t)+gkKmax\alpha(t^+) = (1-g)\alpha(t^-) + g \frac{k}{K_{\max}}

A lightly marked flow is gently throttled; a heavily marked flow is nearly fully backoff, yielding fine-grained, individualized throttling.

4.4 Results of Refined Mechanisms

Experimental results demonstrate:

  • False-positive victim marking reduction exceeding 90%.
  • 60–80% CNP traffic reduction.
  • 100% fabric utilization is restored within a single extra RTT (versus 3–5 RTTs for recovery in classic DCQCN).
  • In case studies, victim flows attain unimpeded line-rate throughput, time to completion for all flows is cut from ~12.5 ms (classic) to ~4 ms (refined), and aggregate throughput increases to 25 GB/s (from 15–20 GB/s) (Olmedilla et al., 7 Nov 2025).

5. Integration with Congestion Isolation: DCQCN and ICI

Recent advances (ICI: Improved Congestion Isolation (Merino et al., 6 Nov 2025)) combine DCQCN with dynamic congestion isolation for better victim/attacker discrimination and faster microburst response.

5.1 Flow Isolation Logic

When switch queue occupancy exceeds a threshold (TisoT_{\text{iso}}), dominant flows are extracted via a Congestion Flow Table (CFT) and redirected to dedicated Congesting Flow Queues (CFQs). Entry allocation, maintenance, and eviction are governed by CFT size and a residency timer (TresT_{\text{res}}).

5.2 Marking Policy Based on Isolation State

For isolated flows (CFQ), standard DCQCN marking thresholds are used; for non-isolated (victim) flows, thresholds are raised:

Kmin(V)>Kmin(C),Kmax(V)>Kmax(C)K_{\min}^{(V)} > K_{\min}^{(C)}, \quad K_{\max}^{(V)} > K_{\max}^{(C)}

pV(q)=max(0,min(1,qKmin(V)Kmax(V)Kmin(V)))p_V(q) = \max\left(0, \min\left(1, \frac{q-K_{\min}^{(V)}}{K_{\max}^{(V)}-K_{\min}^{(V)}}\right)\right)

Victims experience marking only under severe overload while misbehaving flows receive proportionally earlier and harsher marks.

5.3 Effects and Measured Improvements

  • Up to 32× reduction in BECNs, as victims seldom see feedback.
  • 99th percentile flow completion time reduction by up to 31%.
  • No degradation in aggregate throughput; PFC traffic (indicative of HoL blocking) drops by ~20%.
  • Reacts sub-RTT to microbursts and prevents unnecessary feedback-induced oscillations (Merino et al., 6 Nov 2025).

6. Performance Implications, Trade-offs, and Deployment Considerations

The table below summarizes key parameters and effects of classic and refined DCQCN:

Mechanism Parameterization Key Improvement
ECP (Mark on delay) θdelay\theta_{\text{delay}} ~ few µs >90% victim mark reduction
ENP (Batching) Tbatch=10μsT_{\text{batch}} = 10\,\mu s 60–80% CNP suppression
ERP (Scaled backoff) β0=0.2,Kmax=15KB\beta_0 = 0.2, K_{\max}=15 {\rm KB} 1 RTT full utilization
ICI Integration CFT, Tiso,TresT_{\text{iso}}, T_{\text{res}} 32× BECN cut, 31% FCT gain

Refined DCQCN and hybrid approaches maintain or improve full throughput and fairness, sharply reduce congestion signaling volume, and curtail tail latency, especially for short-lived and victim flows. Parameter selection is crucial to manage algorithm stability, responsiveness, and hardware resource budgets (e.g., CFT size in switches).

A plausible implication is that careful coexistence of injection throttling (via DCQCN) with dynamic flow isolation (via CI or ICI) offers further room for performance and efficiency optimizations, if complemented by additional awareness at both switch and end-host layers.

7. Conclusion and Research Directions

DCQCN has demonstrated adaptability and scalability for congestion management in high-performance networks, particularly when extended by mechanisms that discriminate contributors from victims and integrate with real-time isolation logic. Successive refinement—ECP, ENP, ERP, and ICI—has addressed the principal limitations of classic protocols, resulting in improved fairness, lower latency, and suppressed control traffic. Ongoing research focuses on the intersection of closed-loop feedback, real-time flow classification, and workload-driven parameterization to further minimize congestion impact and optimize resource utilization in dense, accelerator-augmented, and AI-driven data center networks (Olmedilla et al., 7 Nov 2025, Merino et al., 6 Nov 2025).

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Data Center Quantized Congestion Notification (DCQCN).