Learning-Based vs Human-Derived Congestion Control: An In-Depth Experimental Study

Published 29 Oct 2025 in cs.NI | (2510.25105v1)

Abstract: Learning-based congestion control (CC), including Reinforcement-Learning, promises efficient CC in a fast-changing networking landscape, where evolving communication technologies, applications and traffic workloads pose severe challenges to human-derived, static CC algorithms. Learning-based CC is in its early days and substantial research is required to understand existing limitations, identify research challenges and, eventually, yield deployable solutions for real-world networks. In this paper, we extend our prior work and present a reproducible and systematic study of learning-based CC with the aim to highlight strengths and uncover fundamental limitations of the state-of-the-art. We directly contrast said approaches with widely deployed, human-derived CC algorithms, namely TCP Cubic and BBR (version 3). We identify challenges in evaluating learning-based CC, establish a methodology for studying said approaches and perform large-scale experimentation with learning-based CC approaches that are publicly available. We show that embedding fairness directly into reward functions is effective; however, the fairness properties do not generalise into unseen conditions. We then show that RL learning-based approaches existing approaches can acquire all available bandwidth while largely maintaining low latency. Finally, we highlight that existing the latest learning-based CC approaches under-perform when the available bandwidth and end-to-end latency dynamically change while remaining resistant to non-congestive loss. As with our initial study, our experimentation codebase and datasets are publicly available with the aim to galvanise the research community towards transparency and reproducibility, which have been recognised as crucial for researching and evaluating machine-generated policies.

Abstract PDF Upgrade to Chat

Authors (3)

Summary

The paper demonstrates that RL-based congestion control mechanisms achieve competitive fairness and TCP friendliness, yet struggle with rapid network dynamics.
The paper finds that experiments using Mininet reveal models like Astraea maintain superior fairness under intra- and inter-RTT variability compared to traditional algorithms.
The paper concludes that despite potential gains in throughput and responsiveness, further improvements in adaptability and training data are essential.

Learning-Based vs Human-Derived Congestion Control: An In-Depth Experimental Study

This essay provides an analysis and summary of the paper "Learning-Based vs Human-Derived Congestion Control: An In-Depth Experimental Study" (2510.25105). The paper conducts a systematic experimental evaluation comparing learning-based congestion control approaches with traditional human-derived algorithms, focusing on elements such as fairness, efficiency, and responsiveness.

Methodology and Experimental Setup

The authors conduct their experimental evaluations using Mininet to emulate network conditions, given its balance between high fidelity and reproducibility. The congestion control (CC) algorithms tested include both human-derived policies like TCP Cubic and BBR (version 3), and various reinforcement learning (RL) based models such as Orca, Sage, Astraea, and PCC Vivace.

Selected CC Approaches

The study evaluates several well-known CC algorithms:

Orca incorporates AIMD heuristics and promises fairness by embedding power within its reward function.
Sage uses pre-collected trace data including established schemes like Cubic and Vegas.
Astraea introduces fairness directly into its reward mechanism.
PCC Vivace implements gradient ascent to dynamically adapt sending rates, promoting fairness in flow allocations.

The performance of these RL-based models is compared against Cubic and BBRv3 through experiments involving both single- and multi-bottleneck scenarios.

Fairness Evaluation

An extensive evaluation of fairness in bandwidth allocation considers various scenarios, including intra-RTT fairness, inter-RTT fairness, and fairness under variable bandwidths.

Intra-RTT Fairness: Astraea performs remarkably well within its training parameter range, while Orca and Sage exhibit variability based on queuing conditions and RTT values.
Inter-RTT Fairness: Astraea continues to perform strongly (Fig. 1), whilst Vivace and Orca experience significant fairness degradation when competing flows have differing RTTs.
Fairness in Parking Lot Topology: In a multi-bottleneck setting, Astraea maintains a near-proportional allocation, demonstrating excellent fairness across different flow configurations.

Figure 1: TCP flow first, Buffer Size: 0.2\times BDP

Backward Compatibility (TCP Friendliness)

The paper also investigates how RL-based models coexist with Cubic, measuring TCP friendliness through two-flow and multi-flow setups:

Two-Flow Setup: Astraea is generally TCP friendly, allowing Cubic flows to share the bandwidth effectively under larger buffer settings.
Multi-Flow Setup: Astraea and Sage exhibit consistent friendliness to Cubic as the count of concurrent Cubic flows increases (Fig. 2).

Figure 2: Buffer Size: 0.2\times BDP

Efficiency and Responsiveness

The authors measure efficiency through aggregate network throughput and latency, revealing that:

Orca shows inefficiencies in environments with fluctuating RTTs, largely due to its reliance on static minimum RTT estimates.
Astraea and Sage offer slightly better adaptability but struggle during frequent bandwidth changes, heavily influenced by the parameter spaces used during training (Fig. 3).

Figure 3: Buffer Size: 0.2\times BDP

The responsiveness analysis also underscores limitations in RL models’ ability to adapt to sudden changes in network conditions, demonstrating marked performance gaps compared to human-driven models like Cubic.

Conclusion

The study concludes that while RL-based congestion control mechanisms display promising adaptability and fairness within certain conditions, significant challenges remain. Specifically, generalizability and responsiveness to fast-changing network dynamics require further enhancement.

Future directions should include refining training datasets to encompass a broader range of network conditions and further integrating fairness and latency considerations directly into the learning objective. It suggests the exploration of hybrid models that dynamically adjust policies based on real-time environmental cues as a potential direction for future research efforts.

Markdown Report Issue