Edge-Disjoint Spanning Trees on Star-Product Networks (2403.12231v2)
Abstract: A star-product operation may be used to create large graphs from smaller factor graphs. Network topologies based on star-products demonstrate several advantages including low-diameter, high scalability, modularity and others. Many state-of-the-art diameter-2 and -3 topologies~(Slim Fly, Bundlefly, PolarStar etc.) can be represented as star products. In this paper, we explore constructions of edge-disjoint spanning trees~(EDSTs) in star-product topologies. EDSTs expose multiple parallel disjoint pathways in the network and can be leveraged to accelerate collective communication, enhance fault tolerance and network recovery, and manage congestion. Our EDSTs have provably maximum or near-maximum cardinality which amplifies their benefits. We further analyze their depths and show that for one of our constructions, all trees have order of the depth of the EDSTs of the factor graphs, and for all other constructions, a large subset of the trees have that depth.
- HyperX: Topology, Routing, and Packaging of Efficient Large-Scale Networks. In Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis (Portland, Oregon) (SC ’09). Association for Computing Machinery, New York, NY, USA, Article 41, 11 pages. https://doi.org/10.1145/1654059.1654101
- Extremely large minibatch sgd: Training resnet-50 on imagenet in 15 minutes. arXiv preprint arXiv:1711.04325 (2017).
- Paley graphs have Hamilton decompositions. Discrete Mathematics 312, 1 (2012), 113–118. https://doi.org/10.1016/j.disc.2011.06.003 Algebraic Graph Theory — A Volume Dedicated to Gert Sabidussi on the Occasion of His 80th Birthday.
- Jordi Arjona Aroca and Antonio Fernández Anta. 2014. Bisection (Band)Width of Product Networks with Application to Data Centers. IEEE Transactions on Parallel and Distributed Systems 25, 3 (2014), 570–580. https://doi.org/10.1109/TPDS.2013.95
- On edge-disjoint spanning trees in hypercubes. Inform. Process. Lett. 70, 1 (1999), 13–16.
- Large graphs with given degree and diameter III. Ann. of Discrete Math. 13 (1982), 23–32.
- Maciej Besta and Torsten Hoefler. 2014. Slim Fly: A Cost Effective Low-Diameter Network Topology. In Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis (New Orleans, LA, USA). Association for Computing Machinery, New York, NY, USA.
- W. G. Brown. 1966. On Graphs that do not Contain a Thomsen Graph. Can. Math. Bull. 9, 3 (1966), 281–285. https://doi.org/10.4153/CMB-1966-036-2
- Independent Spanning Trees in Networks: A Survey. ACM Comput. Surv. 55, 14s, Article 335 (jul 2023), 29 pages. https://doi.org/10.1145/3591110
- K. Day and A.-E. Al-Ayyoub. 2000. Minimal fault diameter for highly resilient product networks. IEEE Transactions on Parallel and Distributed Systems 11, 9 (2000), 926–930. https://doi.org/10.1109/71.879775
- Paul Erdős and Alfred Rényi. 1962. On a problem in the theory of graphs. Publ. Math. Inst. Hungar. Acad. Sci. 7A (1962), 623–641.
- Paraskevi Fragopoulou and Selim G. Akl. 1996. Edge-disjoint spanning trees on the star network with applications to fault tolerance. IEEE Trans. Comput. 45, 2 (1996), 174–185.
- Paul R. Hafner. 2004. Geometric realisation of the graphs of McKay–Miller–Širáň. Journal of Combinatorial Theory, Series B 90, 2 (2004), 223–232. https://doi.org/10.1016/j.jctb.2003.07.002
- Resource placement in Cartesian product of networks. J. Parallel and Distrib. Comput. 70, 5 (2010), 481–495. https://doi.org/10.1016/j.jpdc.2009.06.005
- Efficient deadlock-free multi-dimensional interval routing in interconnection networks. In Distributed Computing, Shay Kutten (Ed.). Springer Berlin Heidelberg, Berlin, Heidelberg, 273–287.
- Constructing edge-disjoint spanning trees in product networks. IEEE Transactions on Parallel and Distributed Systems 14, 3 (2003), 213–221. https://doi.org/10.1109/TPDS.2003.1189580
- In-Network Allreduce with Multiple Spanning Trees on PolarFly. In Proceedings of the 35th ACM Symposium on Parallelism in Algorithms and Architectures (Orlando, FL, USA) (SPAA ’23). Association for Computing Machinery, New York, NY, USA, 165–176. https://doi.org/10.1145/3558481.3591073
- PolarStar: Expanding the Scalability Horizon of Diameter-3 Networks. arXiv:2302.07217 [cs.NI]
- Bundlefly: A Low-Diameter Topology for Multicore Fiber. In Proceedings of the 34th ACM International Conference on Supercomputing (Barcelona, Spain) (ICS ’20). Association for Computing Machinery, New York, NY, USA, Article 20, 11 pages. https://doi.org/10.1145/3392717.3392747
- The generalized connectivity of complete bipartite graphs. Ars Comb. 104 (2010), 65–79. https://api.semanticscholar.org/CorpusID:14240977
- A Note on Large Graphs of Diameter Two and Given Maximum Degree. Journal of Combinatorial Theory, Series B 74, 1 (1998), 110–118. https://doi.org/10.1006/jctb.1998.1828
- Efficient large-scale language model training on gpu clusters using megatron-lm. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. 1–15.
- E.M. Palmer. 2001a. On the spanning tree packing number of a graph: a survey. Discrete Mathematics 230, 1 (2001), 13–21. https://doi.org/10.1016/S0012-365X(00)00066-2 Catlin.
- E.M. Palmer. 2001b. On the spanning tree packing number of a graph: a survey. Discrete Mathematics 230, 1 (2001), 13–21. https://doi.org/10.1016/S0012-365X(00)00066-2 Catlin.
- Language modeling at scale. In 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE, 590–599.
- James Roskind and Robert E Tarjan. 1985. A note on finding minimum-cost edge-disjoint spanning trees. Mathematics of Operations Research 10, 4 (1985), 701–708.
- James Anthony Roskind. 1983. Edge disjoint spanning trees and failure recovery in data communication networks. Ph. D. Dissertation. Massachusetts Institute of Technology.
- Alexander Sergeev and Mike Del Balso. 2018. Horovod: fast and easy distributed deep learning in TensorFlow. arXiv preprint arXiv:1802.05799 (2018).
- Abdou Youssef. 1991. Cartesian Product Networks. In International Conference on Parallel Processing. https://api.semanticscholar.org/CorpusID:8249681
- Bandwidth Optimal Pipeline Schedule for Collective Communication. arXiv preprint arXiv:2305.18461 (2023).
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.