Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
162 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Integrated Topology and Traffic Engineering for Reconfigurable Datacenter Networks (2402.09115v1)

Published 14 Feb 2024 in cs.NI

Abstract: The state-of-the-art topologies of datacenter networks are fixed, based on electrical switching technology, and by now, we understand their throughput and cost well. For the past years, researchers have been developing novel optical switching technologies that enable the emergence of reconfigurable datacenter networks (RDCNs) that support dynamic psychical topologies. The art of network design of dynamic topologies, i.e., 'Topology Engineering,' is still in its infancy. Different designs offer distinct advantages, such as faster switch reconfiguration times or demand-aware topologies, and to date, it is yet unclear what design maximizes the throughput. This paper aims to improve our analytical understanding and formally studies the throughput of reconfigurable networks by presenting a general and unifying model for dynamic networks and their topology and traffic engineering. We use our model to study demand-oblivious and demand-aware systems and prove new upper bounds for the throughput of a system as a function of its topology and traffic schedules. Next, we offer a novel system design that combines both demand-oblivious and demand-aware schedules, and we prove its throughput supremacy under a large family of demand matrices. We evaluate our design numerically for sparse and dense traffic and show that our approach can outperform other designs by up to 25% using common network parameters.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (42)
  1. U. A. data center growth through optics, https://www.laserfocusworld.com/optics/article/14300952/unleashing-ai-data-center-growth-through-optics.
  2. A. Bjorlin, “Infrastructure for large scale ai: ”empowering open”,” in Open Compute Project, San Jose, CA, October 2022.
  3. M. Al-Fares, A. Loukissas, and A. Vahdat, “A scalable, commodity data center network architecture,” ACM SIGCOMM computer communication review, vol. 38, no. 4, pp. 63–74, 2008.
  4. V. Liu, D. Halperin, A. Krishnamurthy, and T. Anderson, “F10: A fault-tolerant engineered network,” in Presented as part of the 10th {normal-{\{{USENIX}normal-}\}} Symposium on Networked Systems Design and Implementation ({normal-{\{{NSDI}normal-}\}} 13), 2013, pp. 399–412.
  5. A. Singh, J. Ong, A. Agarwal, G. Anderson, A. Armistead, R. Bannon, S. Boving, G. Desai, B. Felderman, P. Germano et al., “Jupiter rising: A decade of clos topologies and centralized control in google’s datacenter network,” ACM SIGCOMM computer communication review, vol. 45, no. 4, pp. 183–197, 2015.
  6. S. Kassing, A. Valadarsky, G. Shahaf, M. Schapira, and A. Singla, “Beyond fat-trees without antennae, mirrors, and disco-balls,” in Proceedings of the Conference of the ACM Special Interest Group on Data Communication, 2017, pp. 281–294.
  7. G. Wang, D. G. Andersen, M. Kaminsky, M. Kozuch, T. Ng, K. Papagiannaki, M. Glick, and L. Mummert, “Your data center is a router: The case for reconfigurable optical circuit switched paths,” Proc. ACM Hotnets VIII, 2009.
  8. L. Poutievski, O. Mashayekhi, J. Ong, A. Singh, M. Tariq, R. Wang, J. Zhang, V. Beauregard, P. Conner, S. Gribble et al., “Jupiter evolving: Transforming google’s datacenter network via optical circuit switches and software-defined networking,” in Proceedings of the ACM SIGCOMM 2022 Conference, 2022, pp. 66–85.
  9. M. N. Hall, K.-T. Foerster, S. Schmid, and R. Durairajan, “A survey of reconfigurable optical networks,” Optical Switching and Networking, vol. 41, p. 100621, 2021.
  10. H. Ballani, P. Costa, R. Behrendt, D. Cletheroe, I. Haller, K. Jozwik, F. Karinou, S. Lange, K. Shi, B. Thomsen et al., “Sirius: A flat datacenter network with nanosecond optical switching,” in Proceedings of the Annual conference of the ACM Special Interest Group on Data Communication on the applications, technologies, architectures, and protocols for computer communication, 2020, pp. 782–797.
  11. C. Avin and S. Schmid, “Toward demand-aware networking: A theory for self-adjusting networks,” ACM SIGCOMM Computer Communication Review, vol. 48, no. 5, pp. 31–40, 2019.
  12. W. M. Mellette, R. McGuinness, A. Roy, A. Forencich, G. Papen, A. C. Snoeren, and G. Porter, “Rotornet: A scalable, low-complexity, optical datacenter network,” in Proceedings of the Conference of the ACM Special Interest Group on Data Communication, 2017, pp. 267–280.
  13. V. Addanki, C. Avin, and S. Schmid, “Mars: Near-optimal throughput with shallow buffers in reconfigurable datacenter networks,” Proceedings of the ACM on Measurement and Analysis of Computing Systems, vol. 7, no. 1, pp. 1–43, 2023.
  14. C. Griner, J. Zerwas, A. Blenk, M. Ghobadi, S. Schmid, and C. Avin, “Cerberus: The power of choices in datacenter topology design-a throughput perspective,” Proceedings of the ACM on Measurement and Analysis of Computing Systems, vol. 5, no. 3, pp. 1–33, 2021.
  15. W. M. Mellette, R. Das, Y. Guo, R. McGuinness, A. C. Snoeren, and G. Porter, “Expanding across time to deliver bandwidth efficiency and low latency,” in 17th USENIX Symposium on Networked Systems Design and Implementation (NSDI 20), 2020, pp. 1–18.
  16. M. Ghobadi, R. Mahajan, A. Phanishayee, N. Devanur, J. Kulkarni, G. Ranade, P.-A. Blanche, H. Rastegarfar, M. Glick, and D. Kilper, “Projector: Agile reconfigurable data center interconnect,” in Proceedings of the 2016 ACM SIGCOMM Conference, 2016, pp. 216–229.
  17. N. Farrington, G. Porter, S. Radhakrishnan, H. H. Bazzaz, V. Subramanya, Y. Fainman, G. Papen, and A. Vahdat, “Helios: a hybrid electrical/optical switch architecture for modular data centers,” in Proceedings of the ACM SIGCOMM 2010 Conference, 2010, pp. 339–350.
  18. S. Bojja Venkatakrishnan, M. Alizadeh, and P. Viswanath, “Costly circuits, submodular schedules and approximate carathéodory theorems,” in Proceedings of the 2016 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Science, 2016, pp. 75–88.
  19. N. Farrington, A. Forencich, G. Porter, P.-C. Sun, J. E. Ford, Y. Fainman, G. C. Papen, and A. Vahdat, “A multiport microsecond optical circuit switch for data center networking,” IEEE Photonics Technology Letters, vol. 25, no. 16, pp. 1589–1592, 2013.
  20. H. Liu, M. K. Mukerjee, C. Li, N. Feltman, G. Papen, S. Savage, S. Seshan, G. M. Voelker, D. G. Andersen, M. Kaminsky et al., “Scheduling techniques for hybrid circuit/packet networks,” in Proceedings of the 11th ACM Conference on Emerging Networking Experiments and Technologies, 2015, pp. 1–13.
  21. N. Hamedazimi, Z. Qazi, H. Gupta, V. Sekar, S. R. Das, J. P. Longtin, H. Shah, and A. Tanwer, “Firefly: A reconfigurable wireless data center fabric using free-space optics,” in Proceedings of the 2014 ACM conference on SIGCOMM, 2014, pp. 319–330.
  22. J. Zerwas, C. Györgyi, A. Blenk, S. Schmid, and C. Avin, “Duo: A high-throughput reconfigurable datacenter network using local routing and control,” Proceedings of the ACM on Measurement and Analysis of Computing Systems, vol. 7, no. 1, pp. 1–25, 2023.
  23. M. Frank and P. Wolfe, “An algorithm for quadratic programming,” Naval research logistics quarterly, vol. 3, no. 1-2, pp. 95–110, 1956.
  24. G. Birkhoff, “Tres observaciones sobre el algebra lineal,” Univ. Nac. Tucuman, Ser. A, vol. 5, pp. 147–154, 1946.
  25. R. Ryf, J. Kim, J. Hickey, A. Gnauck, D. Carr, F. Pardo, C. Bolle, R. Frahm, N. Basavanhally, C. Yoh et al., “1296-port mems transparent optical crossconnect with 2.07 petabit/s switch capacity,” in OFC 2001. Optical Fiber Communication Conference and Exhibit. Technical Digest Postconference Edition (IEEE Cat. 01CH37171), vol. 4.   IEEE, 2001, pp. PD28–PD28.
  26. A. Livshits and S. Vargaftik, “Lumos: A fast and efficient optical circuit switch scheduling technique,” IEEE Communications Letters, vol. 22, no. 10, pp. 2028–2031, 2018.
  27. V. Valls, G. Iosifidis, and L. Tassiulas, “Birkhoff’s decomposition revisited: Sparse scheduling for high-speed circuit switches,” IEEE/ACM Transactions on Networking, vol. 29, no. 6, pp. 2399–2412, 2021.
  28. L. G. Valiant and G. J. Brebner, “Universal schemes for parallel communication,” in Proceedings of the thirteenth annual ACM symposium on Theory of computing, 1981, pp. 263–277.
  29. Y. Azar, E. Cohen, A. Fiat, H. Kaplan, and H. Racke, “Optimal oblivious routing in polynomial time,” in Proceedings of the thirty-fifth annual ACM symposium on Theory of computing, 2003, pp. 383–388.
  30. J. P. Lang, V. Sharma, and E. A. Varvarigos, “An analysis of oblivious and adaptive routing in optical networks with wavelength translation,” IEEE/ACM Transactions on networking, vol. 9, no. 4, pp. 503–517, 2001.
  31. S. A. Jyothi, A. Singla, P. B. Godfrey, and A. Kolla, “Measuring and understanding throughput of network topologies,” in SC’16: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis.   IEEE, 2016, pp. 761–772.
  32. P. Namyar, S. Supittayapornpong, M. Zhang, M. Yu, and R. Govindan, “A throughput-centric view of the performance of datacenter topologies,” in Proc. ACM SIGCOMM Conference, 2021, p. 349–369.
  33. A. Singla, C.-Y. Hong, L. Popa, and P. B. Godfrey, “Jellyfish: Networking data centers randomly,” in 9th USENIX Symposium on Networked Systems Design and Implementation (NSDI 12), 2012, pp. 225–238.
  34. M. Zhang, R. N. Mysore, S. Supittayapornpong, and R. Govindan, “Understanding lifecycle management complexity of datacenter topologies,” in 16th USENIX Symposium on Networked Systems Design and Implementation (NSDI 19), 2019, pp. 235–254.
  35. X. Zhou, Z. Zhang, Y. Zhu, Y. Li, S. Kumar, A. Vahdat, B. Y. Zhao, and H. Zheng, “Mirror mirror on the ceiling: Flexible wireless links for data centers,” ACM SIGCOMM Comput. Commun. Rev. (CCR), vol. 42, no. 4, pp. 443–454, 2012.
  36. N. McKeown, “The islip scheduling algorithm for input-queued switches,” IEEE/ACM transactions on networking, vol. 7, no. 2, pp. 188–201, 1999.
  37. C.-S. Chang, D.-S. Lee, and Y.-S. Jou, “Load balanced birkhoff–von neumann switches, part i: One-stage buffering,” Computer Communications, vol. 25, no. 6, pp. 611–622, 2002.
  38. F. Dufossé and B. Uçar, “Notes on birkhoff–von neumann decomposition of doubly stochastic matrices,” Linear Algebra and its Applications, vol. 497, pp. 108–115, 2016.
  39. J. Kulkarni, E. Lee, and M. Singh, “Minimum birkhoff-von neumann decomposition,” in Integer Programming and Combinatorial Optimization: 19th International Conference, IPCO 2017, Waterloo, ON, Canada, June 26-28, 2017, Proceedings 19.   Springer, 2017, pp. 343–354.
  40. T. Wilson, D. Amir, V. Shrivastav, H. Weatherspoon, and R. Kleinberg, “Extending optimal oblivious reconfigurable networks to all n,” in 2023 Symposium on Algorithmic Principles of Computer Systems (APOCS).   SIAM, 2023, pp. 1–16.
  41. C. Griner, S. Schmid, and C. Avin, “Cachenet: Leveraging the principle of locality in reconfigurable network design,” Computer Networks, vol. 204, p. 108648, 2022.
  42. X. Yuan, S. Mahapatra, M. Lang, and S. Pakin, “Lfti: A new performance metric for assessing interconnect designs for extreme-scale hpc systems,” in 2014 IEEE 28th International Parallel and Distributed Processing Symposium.   IEEE, 2014, pp. 273–282.

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com