FASTFLOW: Flexible Adaptive Congestion Control for High-Performance Datacenters (2404.01630v3)
Abstract: The increasing demand of ML workloads in datacenters places significant stress on current congestion control (CC) algorithms, many of which struggle to maintain performance at scale. These workloads generate bursty, synchronized traffic that requires both rapid response and fairness across flows. Unfortunately, existing CC algorithms that rely heavily on delay as a primary congestion signal often fail to react quickly enough and do not consistently ensure fairness. In this paper, we propose FASTFLOW, a streamlined sender-based CC algorithm that integrates delay, ECN signals, and optional packet trimming to achieve precise, real-time adjustments to congestion windows. Central to FASTFLOW is the QuickAdapt mechanism, which provides accurate bandwidth estimation at the receiver, enabling faster reactions to network conditions. We also show that FASTFLOW can effectively enhance receiver-based algorithms such as EQDS by improving their ability to manage in-network congestion. Our evaluation reveals that FASTFLOW outperforms cutting-edge solutions, including EQDS, Swift, BBR, and MPRDMA, delivering up to 50% performance improvements in modern datacenter networks.
- Implementing packet trimming support in hardware. (2022). arXiv:cs.NI/2207.04967
- CONGA: Distributed Congestion-Aware Load Balancing for Datacenters. In Proceedings of the 2014 ACM Conference on SIGCOMM (SIGCOMM ’14). Association for Computing Machinery, New York, NY, USA, 503–514. https://doi.org/10.1145/2619239.2626316
- Data Center TCP (DCTCP). SIGCOMM Comput. Commun. Rev. 40, 4 (aug 2010), 63–74. https://doi.org/10.1145/1851275.1851192
- Data Center TCP (DCTCP). In Proceedings of the ACM SIGCOMM 2010 Conference (SIGCOMM ’10). Association for Computing Machinery, New York, NY, USA, 63–74. https://doi.org/10.1145/1851182.1851192
- Less Is More: Trading a Little Bandwidth for Ultra-Low Latency in the Data Center. In 9th USENIX Symposium on Networked Systems Design and Implementation (NSDI 12). USENIX Association, San Jose, CA, 253–266. https://www.usenix.org/conference/nsdi12/technical-sessions/presentation/alizadeh
- Bolt: Sub-RTT Congestion Control for Ultra-Low Latency. In 20th USENIX Symposium on Networked Systems Design and Implementation (NSDI 23). USENIX Association, Boston, MA, 219–236. https://www.usenix.org/conference/nsdi23/presentation/arslan
- Empowering Azure Storage with RDMA. In 20th USENIX Symposium on Networked Systems Design and Implementation (NSDI 23). USENIX Association, Boston, MA, 49–67. https://www.usenix.org/conference/nsdi23/presentation/bai
- Maciej Besta and Torsten Hoefler. 2014. Slim Fly: A Cost Effective Low-Diameter Network Topology. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC ’14). IEEE Press, 348–359. https://doi.org/10.1109/SC.2014.34
- Broadcom. 2024a. Deploying AI/ML training clusters with IP/Ethernet. (2024). https://www.broadcom.com/blog/deploying-ai-ml-training-clusters-with-ip-ethernet (accessed 01/24).
- Broadcom. 2024b. Tomahawk 5 Switch. (2024). https://www.broadcom.com/products/ethernet-connectivity/switching/strataxgs/bcm78900-series (accessed 01/24).
- Per-Packet Load-Balanced, Low-Latency Routing for Clos-Based Data Center Networks. In Proceedings of the Ninth ACM Conference on Emerging Networking Experiments and Technologies (CoNEXT ’13). Association for Computing Machinery, New York, NY, USA, 49–60. https://doi.org/10.1145/2535372.2535375
- BBR: Congestion-Based Congestion Control. Commun. ACM 60 (2017), 58–66. http://cacm.acm.org/magazines/2017/2/212428-bbr-congestion-based-congestion-control/fulltext
- V. Cerf and R. Kahn. 1974. A Protocol for Packet Network Intercommunication. IEEE Transactions on Communications 22, 5 (1974), 637–648. https://doi.org/10.1109/TCOM.1974.1092259
- Understanding TCP Incast Throughput Collapse in Datacenter Networks. In Proceedings of the 1st ACM Workshop on Research on Enterprise Networking (WREN ’09). Association for Computing Machinery, New York, NY, USA, 73–82. https://doi.org/10.1145/1592681.1592693
- Catch the Whole Lot in an Action: Rapid Precise Packet Loss Notification in Data Center. In 11th USENIX Symposium on Networked Systems Design and Implementation (NSDI 14). USENIX Association, Seattle, WA, 17–28. https://www.usenix.org/conference/nsdi14/technical-sessions/presentation/cheng
- D. Hernandez D. Amodei. 2018. The Computational Limits of Deep Learning. (2018). https://openai.com/research/ai-and-compute (accessed 9/23).
- Noise in the Clouds: Influence of Network Performance Variability on Application Scalability. Proc. ACM Meas. Anal. Comput. Syst. 6, 3, Article 49 (Dec. 2022), 27 pages. https://doi.org/10.1145/3570609 arXiv:2210.15315
- Mitigating Network Noise on Dragonfly Networks through Application-Aware Routing. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC ’19). Association for Computing Machinery, New York, NY, USA, Article 16, 32 pages. https://doi.org/10.1145/3295500.3356196
- An In-Depth Analysis of the Slingshot Interconnect. In SC20: International Conference for High Performance Computing, Networking, Storage and Analysis. 1–14. https://doi.org/10.1109/SC41405.2020.00039
- Jeffrey Dean and Luiz André Barroso. 2013. The Tail at Scale. Commun. ACM 56, 2 (feb 2013), 74–80. https://doi.org/10.1145/2408776.2408794
- On the impact of packet spraying in data center networks. In 2013 Proceedings IEEE INFOCOM. 2130–2138. https://doi.org/10.1109/INFCOM.2013.6567015
- S. Floyd and V. Jacobson. 1993. Random early detection gateways for congestion avoidance. IEEE/ACM Transactions on Networking 1, 4 (1993), 397–413. https://doi.org/10.1109/90.251892
- The Addition of Explicit Congestion Notification (ECN) to IP. RFC 3168. (Sept. 2001). https://doi.org/10.17487/RFC3168
- PHost: Distributed near-Optimal Datacenter Transport over Commodity Network Fabric. In Proceedings of the 11th ACM Conference on Emerging Networking Experiments and Technologies (CoNEXT ’15). Association for Computing Machinery, New York, NY, USA, Article 1, 12 pages. https://doi.org/10.1145/2716281.2836086
- DRILL: Micro Load Balancing for Low-Latency Data Center Networks. In Proceedings of the Conference of the ACM Special Interest Group on Data Communication (SIGCOMM ’17). Association for Computing Machinery, New York, NY, USA, 225–238. https://doi.org/10.1145/3098822.3098839
- Aquila: A unified, low-latency fabric for datacenter networks. In 19th USENIX Symposium on Networked Systems Design and Implementation (NSDI 22). USENIX Association, Renton, WA, 1249–1266. https://www.usenix.org/conference/nsdi22/presentation/gibson
- Backpressure Flow Control. In 19th USENIX Symposium on Networked Systems Design and Implementation (NSDI 22). USENIX Association, Renton, WA, 779–805. https://www.usenix.org/conference/nsdi22/presentation/goyal
- BCube: A High Performance, Server-Centric Network Architecture for Modular Data Centers. In Proceedings of the ACM SIGCOMM 2009 Conference on Data Communication (SIGCOMM ’09). Association for Computing Machinery, New York, NY, USA, 63–74. https://doi.org/10.1145/1592568.1592577
- Re-Architecting Datacenter Networks and Stacks for Low Latency and High Performance. In Proceedings of the Conference of the ACM Special Interest Group on Data Communication (SIGCOMM ’17). Association for Computing Machinery, New York, NY, USA, 29–42. https://doi.org/10.1145/3098822.3098825
- HammingMesh: A Network Topology for Large-Scale Deep Learning. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC ’22). IEEE Press, Article 11, 18 pages.
- Data Center Ethernet and Remote Direct Memory Access: Issues at Hyperscale. Computer 56, 7 (2023), 67–77. https://doi.org/10.1109/MC.2023.3261184
- The Effect of Network Noise on Large-Scale Collective Communications. Parallel Processing Letters (PPL) 19, 4 (Aug. 2009), 573–593.
- C. Hopps. 2009. Analysis of an Equal-Cost Multi-Path Algorithm. RFC 2992. (Nov. 2009). https://www.ietf.org/rfc/rfc2992.txt
- Network Endpoint Congestion Control for Fine-Grained Communication. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC ’15). Association for Computing Machinery, New York, NY, USA, Article 35, 12 pages. https://doi.org/10.1145/2807591.2807600
- FlowBender: Flow-level Adaptive Routing for Improved Latency and Throughput in Datacenter Networks. In Proceedings of the 10th ACM International on Conference on Emerging Networking Experiments and Technologies (CoNEXT ’14). Association for Computing Machinery, New York, NY, USA, 149–160. https://doi.org/10.1145/2674005.2674985
- Technology-Driven, Highly-Scalable Dragonfly Topology. In 2008 International Symposium on Computer Architecture. 77–88. https://doi.org/10.1109/ISCA.2008.19
- Swift: Delay is Simple and Effective for Congestion Control in the Datacenter. https://dl.acm.org/doi/pdf/10.1145/3387514.3406591
- DX: Latency-Based Congestion Control for Datacenters. IEEE/ACM Transactions on Networking 25, 1 (2017), 335–348. https://doi.org/10.1109/TNET.2016.2587286
- HPCC: High Precision Congestion Control. In Proceedings of the ACM Special Interest Group on Data Communication (SIGCOMM ’19). Association for Computing Machinery, New York, NY, USA, 44–58. https://doi.org/10.1145/3341302.3342085
- Multi-path transport for RDMA in datacenters. In Proceedings of the 15th USENIX Conference on Networked Systems Design and Implementation (NSDI’18). USENIX Association, USA, 357–371.
- TIMELY: RTT-based Congestion Control for the Datacenter. In Sigcomm ’15.
- Revisiting Network Support for RDMA. In Proceedings of the 2018 Conference of the ACM Special Interest Group on Data Communication (SIGCOMM ’18). Association for Computing Machinery, New York, NY, USA, 313–326. https://doi.org/10.1145/3230543.3230557
- Homa: A Receiver-Driven Low-Latency Transport Protocol Using Network Priorities. In Proceedings of the 2018 Conference of the ACM Special Interest Group on Data Communication (SIGCOMM ’18). Association for Computing Machinery, New York, NY, USA, 221–235. https://doi.org/10.1145/3230543.3230564
- Kathleen Nichols and Van Jacobson. 2012. Controlling Queue Delay: A modern AQM is just one piece of the solution to bufferbloat. Queue 10, 5 (may 2012), 20–34. https://doi.org/10.1145/2208917.2209336
- Nvidia. 2024. Networking for the Era of AI: The Network Defines the Data Center. (2024). https://nvdam.widen.net/s/bvpmlkbgzt/networking-overall-whitepaper-networking-for-ai-2911204 (accessed 01/24).
- An edge-queued datagram service for all datacenter traffic. In 19th USENIX Symposium on Networked Systems Design and Implementation (NSDI 22). USENIX Association, Renton, WA, 761–777. https://www.usenix.org/conference/nsdi22/presentation/olteanu
- Jupiter Evolving: Transforming Google’s Datacenter Network via Optical Circuit Switches and Software-Defined Networking. In Proceedings of ACM SIGCOMM 2022.
- PLB: congestion signals are simple and effective for network load balancing. In Proceedings of the ACM SIGCOMM 2022 Conference (SIGCOMM ’22). Association for Computing Machinery, New York, NY, USA, 207–218. https://doi.org/10.1145/3544216.3544226
- Congestion control in machine learning clusters. In Proceedings of the 21st ACM Workshop on Hot Topics in Networks (HotNets ’22). Association for Computing Machinery, New York, NY, USA, 235–242. https://doi.org/10.1145/3563766.3564115
- Adaptive Routing in InfiniBand Hardware. In 2022 22nd IEEE International Symposium on Cluster, Cloud and Internet Computing (CCGrid). 463–472. https://doi.org/10.1109/CCGrid54584.2022.00056
- Inside the Social Network’s (Datacenter) Network. SIGCOMM Comput. Commun. Rev. 45, 4 (aug 2015), 123–137. https://doi.org/10.1145/2829988.2787472
- HINT: Supporting Congestion Control Decisions with P4-driven In-Band Network Telemetry. In 2023 IEEE 24th International Conference on High Performance Switching and Routing (HPSR). 83–88. https://doi.org/10.1109/HPSR57248.2023.10147977
- Mitigating Network Noise on Dragonfly Networks through Application-Aware Routing. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC19).
- A Cloud-Optimized Transport Protocol for Elastic and Scalable HPC. IEEE Micro 40, 6 (2020), 67–73. https://doi.org/10.1109/MM.2020.3016891
- Jupiter Rising: A Decade of Clos Topologies and Centralized Control in Google’s Datacenter Network. In Sigcomm ’15.
- Surviving switch failures in cloud datacenters. SIGCOMM Comput. Commun. Rev. 51, 2 (may 2021), 2–9. https://doi.org/10.1145/3464994.3464996
- RoCC: Robust Congestion Control for RDMA. In Proceedings of the 16th International Conference on Emerging Networking EXperiments and Technologies (CoNEXT ’20). Association for Computing Machinery, New York, NY, USA, 17–30. https://doi.org/10.1145/3386367.3431316
- The Computational Limits of Deep Learning. (2022). arXiv:cs.LG/2007.05558
- Deadline-Aware Datacenter Tcp (D2TCP). In Proceedings of the ACM SIGCOMM 2012 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication (SIGCOMM ’12). Association for Computing Machinery, New York, NY, USA, 115–126. https://doi.org/10.1145/2342356.2342388
- Congestion Control Using In-Network Telemetry for Lossless Datacenters. Computers, Materials & Continua 75, 1 (2023), 1195–1212. https://doi.org/10.32604/cmc.2023.035932
- Tuning ECN for Data Center Networks. In ACM CoNEXT’12. ACM. https://www.microsoft.com/en-us/research/publication/tuning-ecn-for-data-center-networks/
- EMPTCP: An ECN Based Approach to Detect Shared Bottleneck in MPTCP. In 2019 28th International Conference on Computer Communication and Networks (ICCCN). 1–10. https://doi.org/10.1109/ICCCN.2019.8847013
- High-Resolution Measurement of Data Center Microbursts. In Proceedings of the 2017 Internet Measurement Conference (IMC ’17). Association for Computing Machinery, New York, NY, USA, 78–85. https://doi.org/10.1145/3131365.3131375
- PACC: Proactive and Accurate Congestion Feedback for RDMA Congestion Control. In IEEE INFOCOM 2022 - IEEE Conference on Computer Communications. 2228–2237. https://doi.org/10.1109/INFOCOM48880.2022.9796803
- ExpressPass++: Credit-Effecient Congestion Control for Data Centers. In 2019 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Big Data & Cloud Computing, Sustainable Computing & Communications, Social Computing & Networking (ISPA/BDCloud/SocialCom/SustainCom). 46–52. https://doi.org/10.1109/ISPA-BDCloud-SustainCom-SocialCom48970.2019.00018
- Congestion Control for Large-Scale RDMA Deployments. In SIGCOMM (sigcomm ed.). ACM - Association for Computing Machinery. https://www.microsoft.com/en-us/research/publication/congestion-control-for-large-scale-rdma-deployments/