FairQ: Fair and Fast Rate Allocation in Data Centers (2401.04850v1)
Abstract: The peculiar congestion patterns in data centers are caused by the bursty and composite nature of traffic, the small bandwidth-delay product, and the tiny switch buffers. It is not practical to modify TCP to adapt to data centers, especially in public clouds where multiple congestion control protocols coexist. In this work, we design a switch-based method to address such congestion issues; our approach does not require any modification to TCP, which enables easy and seamless deployment in public data centers via switch software update. We first present a simple analysis to demonstrate the stability and effectiveness of the scheme, and then we discuss a hardware NetFPGA switch-based prototype. The experimental results from real deployments in a small testbed cluster show the effectiveness of our approach.
- M. I. Guo Liang, “The War between Mice and Elephants,” in Proceedings of the International Conference on Network Protocols (ICNP), 2001.
- W. Wang, Y. Sun, K. Salamatian, and Z. Li, “Adaptive path isolation for elephant and mice flows by exploiting path diversity in datacenters,” IEEE Transactions on Network and Service Management, vol. 13, no. 1, pp. 5–18, 2016.
- S. Fahmy and T. P. Karwa, “TCP Congestion Control: Overview and Survey Of Ongoing Research,” tech. rep., Purdue University, 2001.
- “A survey on tcp-friendly congestion control,” IEEE Network, vol. 15, no. 3, pp. 28–37, 2001.
- D. LIU and W. BAPTISTE, “On Approaches to Congestion Control over Wireless Networks,” International Journal of Communications, Network and System Sciences, vol. 2, no. 3, pp. 222–228, 2009.
- R. P. Tahiliani, M. P. Tahiliani, and K. C. Sekaran, “TCP Variants for Data Center Networks: A Comparative Study,” in Proceedings of International Symposium on Cloud and Services Computing, 2012.
- Y. Chen, R. Griffith, J. Liu, R. H. Katz, and A. D. Joseph, “Understanding TCP Incast Throughput Collapse in Datacenter Networks,” in Proc. of 1st ACM workshop on Research on enterprise networking (WREN), 2009.
- M. Alizadeh, A. Greenberg, D. A. Maltz, J. Padhye, P. Patel, B. Prabhakar, S. Sengupta, and M. Sridharan, “Data center TCP (DCTCP),” ACM SIGCOMM Computer Communication Review, vol. 40, p. 63, 2010.
- H. Wu, Z. Feng, C. Guo, and Y. Zhang, “ICTCP: Incast Congestion Control for TCP in Data-center Networks,” IEEE/ACM Transactions on Networking, vol. 21, pp. 345–358, 2013.
- CISCO inc, “Understanding the Available Bit Rate (ABR) Service Category for ATM VCs.” http://www.cisco.com/c/en/us/support/docs/asynchronous-transfer-mode-atm/atm-traffic-management/10415-atmabr.html.
- D. Katabi, M. Handley, C. Rohrs, D. Katabi, M. Handley, and C. Rohrs, “Congestion Control for High Bandwidth-delay Product Networks,” ACM SIGCOMM Computer Communication Review, vol. 32, no. 4, 2002.
- A. M. Abdelmoniem and B. Bensaou, “Reconciling Mice and Elephants in Data Center Networks,” in IEEE International Conference on Cloud Networking (CloudNet), 2015.
- A. M. Abdelmoniem and B. Bensaou, “Efficient Switch-Assisted Congestion Control for Data Centers: an Implementation and Evaluation,” in IEEE International Performance Computing and Communications Conference (IPCCC), Dec. 2015.
- S. Molnár, B. Sonkoly, and T. A. Trinh, “A Comprehensive TCP Fairness Analysis in High Speed Networks,” Computer Communications, vol. 32, pp. 1460–1484, 2009.
- G. Marfia, C. Palazzi, G. Pau, M. Gerla, M. Y. Sanadidi, and M. Roccetti, “Tcp libra: Exploring rtt-fairness for tcp,” in NETWORKING, 2007.
- J. Postel, “RFC 793 - Transmission Control Protocol,” 1981. http://www.ietf.org/rfc/rfc793.txt.
- D. Borman, R. Braden, V. Jackbson, and S. R., “TCP Extensions for High Performance,” 2014. https://datatracker.ietf.org/doc/html/rfc7323.
- S. Floyd and V. Jacobson, “Random Early Detection Gateways for Congestion Avoidance,” IEEE/ACM Transactions on Networking, vol. 1, 1993.
- J. Padhye, V. Firoiu, D. Towsley, and J. Kurose, “Modeling TCP Throughput: A Simple Model and Its Empirical Validation,” SIGCOMM Comput. Commun. Rev., vol. 28, pp. 303–314, Oct. 1998.
- V. Misra, W.-B. Gong, D. Towsley, V. Misra, W.-B. Gong, and D. Towsley, “Fluid-based Analysis of A Network of AQM Routers Supporting TCP Flows with An Application to RED,” in Proceedings of the ACM SIGCOMM, pp. 151–160, 2000.
- NS2, “The Network Simulator ns2 Project.” http://www.isi.edu/nsnam/ns.
- K. D. Community, “Data Center TCP (DCTCP).” https://docs.kernel.org/networking/dctcp.html.
- NetFilter.org, “NetFilter Packet Filtering Framework for linux.” http://www.netfilter.org/.
- iperf, “The TCP/UDP Bandwidth Measurement Tool.” https://iperf.fr/.
- Apache.org, “Apache HTTP Server Benchmarking Tool.” http://httpd.apache.org/docs/2.2/programs/ab.html.
- Pica8, “Pica8 Pronto-3295 Switch Technical Specifications.” http://www.pica8.com/documents/pica8-datasheet-48x1gbe-p3290-p3295.pdf.
- OpenvSwitch.org, “Open Virtual Switch project.” http://openvswitch.org/.
- A. M. Abdelmoniem, Y. M. Abdelmoniem, and B. Bensaou, “On Network Systems Design: Pushing the Performance Envelope via FPGA Prototyping,” in IEEE international Conference on Recent Trends in Computer Engineering (IEEE ITCE), 2019.
- A. M. Abdelmoniem and B. Bensaou, “Hysteresis-based Active Queue Management for TCP Traffic in Data Centers,” in IEEE INFOCOM, 2019.
- J. W. Lockwood, N. McKeown, G. Watson, G. Gibb, P. Hartke, J. Naous, R. Raghuraman, and J. Luo, “NetFPGA - An Open Platform for Gigabit-rate Network Switching and Routing,” in Proceedings of IEEE International Conference on Microelectronic Systems Education (MSE), pp. 160–161, 2007.
- netfpga.org, “NetFPGA 1G Specifications.” http://netfpga.org/1G_specs.html.
- A. Rijsinghani, “RFC 1624 - Computation of the Internet Checksum via Incremental Update,” 1994. https://tools.ietf.org/html/rfc1624.
- W. Cheng, F. Ren, W. Jiang, K. Qian, T. Zhang, and R. Shu, “Isolating mice and elephant in data centers,” ArXiv 1605.07732, 2016.
- A. J. Abu, B. Bensaou, and A. M. Abdelmoniem, “Leveraging the Pending Interest Table Occupancy for Congestion Control in CCN,” in IEEE Local Computer Networks (LCN), 2016.
- A. M. Abdelmoniem and B. Bensaou, “Enforcing Transport-Agnostic Congestion Control via SDN in Data Centers,” in IEEE Local Computer Networks (LCN), (Singapore), October 2017.
- A. M. Abdelmoniem, B. Bensaou, and V. Barsoum, “IncastGuard: An Efficient TCP-Incast Congestion Effects Mitigation Scheme for Data Center Network,” in IEEE International Conference on Global Communications (IEEE GlobeCom),, 2018.
- A. M. Abdelmoniem and B. Bensaou, “Incast-Aware Switch-Assisted TCP Congestion Control for Data Centers,” in IEEE Global Communications Conference (GlobeCom), 2015.
- A. M. Abdelmoniem and B. Bensaou, “Design and Implementation of Fair Congestion Control for Data Centers Networks,” in ArXiv 2012.00339, 2020.
- A. M. Abdelmoniem and B. Bensaou, “Enhancing tcp via hysteresis switching: Theoretical analysis and empirical evaluation,” IEEE/ACM Transactions on Networking (ToN), 2023.
- V. Vasudevan, A. Phanishayee, H. Shah, E. Krevat, D. G. Andersen, G. R. Ganger, G. A. Gibson, and B. Mueller, “Safe and Effective Fine-grained TCP Retransmissions for Datacenter Communication,” ACM SIGCOMM Computer Communication Review, vol. 39, p. 303, 2009.
- P. Cheng, F. Ren, R. Shu, and C. Lin, “Catch the Whole Lot in an Action: Rapid Precise Packet Loss Notification in Data Center,” in Proceedings of the 11th USENIX Symposium on Networked Systems Design and Implementation (NSDI 14), pp. 17–28, 2014.
- A. M. Abdelmoniem and B. Bensaou, “Curbing Timeouts for TCP-Incast in Data Centers via A Cross-Layer Faster Recovery Mechanism,” in IEEE INFOCOM, 2017.
- A. M. Abdelmoniem, H. Susanto, and B. Bensaou, “Taming Latency in Data centers via Active Congestion-Probing,” in IEEE ICDCS, 2019.
- A. M. Abdelmoniem, H. Susanto, and B. Bensaou, “Reducing latency in multi-tenant data centers via cautious congestion watch,” in 49th International Conference on Parallel Processing - ICPP, ICPP ’20, (New York, NY, USA), Association for Computing Machinery, 2020.
- H. Susanto, B. L. Ahmed M. Abdelmoniem, Honggang Zhang, and D. Towsley, “A Near Optimal Multi-Faced Job Scheduler for Datacenter Workloads,” in IEEE ICDCS, 2019.
- A. S. Sabyasachi, H. M. D. Kabir, A. M. Abdelmoniem, and S. K. Mondal, “A resilient auction framework for deadline-aware jobs in cloud spot market,” in 2017 IEEE 36th Symposium on Reliable Distributed Systems (SRDS), pp. 247–249, 2017.
- A. M. Abdelmoniem, B. Bensaou, and A. J. Abu, “HyGenICC: Hypervisor-based Generic IP Congestion Control for Virtualized Data Centers,” in Proceedings of IEEE ICC, 2016.
- A. J. Abu, B. Bensaou, and A. M. Abdelmoniem, “A Markov Model of CCN Pending Interest Table Occupancy with Interest Timeout and Retries,” in IEEE International Confereence on Communications (ICC), 2016.
- A. M. Abdelmoniem, C.-Y. Ho, P. Papageorgiou, and M. Canini, “A comprehensive empirical study of heterogeneity in federated learning,” IEEE Internet of Things Journal, pp. 1–1, 2023.
- A. M. Abdelmoniem, A. N. Sahu, M. Canini, and S. A. Fahmy, “REFL: Resource-Efficient Federated Learning,” in Proceedings of ACM EuroSys, p. 215–232, 2023.
- A. M. Abdelmoniem and M. Canini, “Towards mitigating device heterogeneity in federated learning via adaptive model quantization,” in ACM EuroMLSys, 2021.
- A. Arouj and A. M. Abdelmoniem, “Towards energy-aware federated learning on battery-powered clients,” in ACM Workshop on Data Privacy and Federated Learning Technologies for Mobile Edge Network (FedEdge), MobiCom, 2022.
- Ahmed M. Abdelmoniem (27 papers)
- Brahim Bensaou (6 papers)