Characterizing TCP's Performance for Low-Priority Flows Inside a Cloud (2401.08890v1)
Abstract: Many cloud systems utilize low-priority flows to achieve various performance objectives (e.g., low latency, high utilization), relying on TCP as their preferred transport protocol. However, the suitability of TCP for such low-priority flows is relatively unexplored. Specifically, how prioritization-induced delays in packet transmission can cause spurious timeouts and low utilization. In this paper, we conduct an empirical study to investigate the performance of TCP for low-priority flows under a wide range of realistic scenarios: use-cases (with accompanying workloads) where the performance of low-priority flows is crucial to the functioning of the overall system as well as various network loads and other network parameters. Our findings yield two key insights: 1) for several popular use-cases (e.g., network scheduling), TCP's performance for low-priority flows is within 2x of a near-optimal scheme, 2) for emerging workloads that exhibit an on-off behavior in the high priority queue (e.g., distributed ML model training), TCP's performance for low-priority flows is poor. Finally, we discuss and conduct preliminary evaluation to show that two simple strategies -- weighted fair queuing (WFQ) and cross-queue congestion notification -- can substantially improve TCP's performance for low-priority flows.
- https://github.com/hmmohsin/DAS
- CloudLab. https://www.cloudlab.us/
- Alibaba storage workload. https://github.com/alibaba-edu/High-Precision-Congestion-Control (2019)
- Dukkipati, N.: Rate Control Protocol (Rcp): Congestion Control to Make Flows Complete Quickly. Ph.D. thesis (2008)
- Gardner, K.: Modeling and Analyzing Systems with Redundancy. Ph.D. thesis (2017)
- Judd, G.: Attaining the promise and avoiding the pitfalls of TCP in the datacenter. In: USENIX NSDI (2015)