Low-Bandwidth Matrix Multiplication: Faster Algorithms and More General Forms of Sparsity (2404.15559v2)
Abstract: In prior work, Gupta et al. (SPAA 2022) presented a distributed algorithm for multiplying sparse $n \times n$ matrices, using $n$ computers. They assumed that the input matrices are uniformly sparse--there are at most $d$ non-zeros in each row and column--and the task is to compute a uniformly sparse part of the product matrix. The sparsity structure is globally known in advance (this is the supported setting). As input, each computer receives one row of each input matrix, and each computer needs to output one row of the product matrix. In each communication round each computer can send and receive one $O(\log n)$-bit message. Their algorithm solves this task in $O(d{1.907})$ rounds, while the trivial bound is $O(d2)$. We improve on the prior work in two dimensions: First, we show that we can solve the same task faster, in only $O(d{1.832})$ rounds. Second, we explore what happens when matrices are not uniformly sparse. We consider the following alternative notions of sparsity: row-sparse matrices (at most $d$ non-zeros per row), column-sparse matrices, matrices with bounded degeneracy (we can recursively delete a row or column with at most $d$ non-zeros), average-sparse matrices (at most $dn$ non-zeros in total), and general matrices.
- Distributed computation in node-capacitated networks. In Christian Scheideler and Petra Berenbrink, editors, The 31st ACM on Symposium on Parallelism in Algorithms and Architectures, SPAA 2019, Phoenix, AZ, USA, June 22-24, 2019, pages 69–79. ACM, 2019. doi:10.1145/3323165.3323195.
- Fast approximate shortest paths in the congested clique. Distributed Computing, 34(6):463–487, 2021. doi:10.1007/S00446-020-00380-5.
- Algebraic methods in the congested clique. Distributed Computing, 32(6):461–478, 2019. doi:10.1007/S00446-016-0270-2.
- On distributed listing of cliques. In Yuval Emek and Christian Cachin, editors, PODC ’20: ACM Symposium on Principles of Distributed Computing, Virtual Event, Italy, August 3-7, 2020, pages 474–482. ACM, 2020. doi:10.1145/3382734.3405742.
- Sparse matrix multiplication and triangle listing in the congested clique model. In Jiannong Cao, Faith Ellen, Luís Rodrigues, and Bernardo Ferreira, editors, 22nd International Conference on Principles of Distributed Systems, OPODIS 2018, December 17-19, 2018, Hong Kong, China, volume 125 of LIPIcs, pages 4:1–4:17. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2018. doi:10.4230/LIPICS.OPODIS.2018.4.
- Near-optimal distributed triangle enumeration via expander decompositions. Journal of the ACM, 68(3):21:1–21:36, 2021. doi:10.1145/3446330.
- Improved distributed expander decomposition and nearly optimal triangle enumeration. In Peter Robinson and Faith Ellen, editors, Proceedings of the 2019 ACM Symposium on Principles of Distributed Computing, PODC 2019, Toronto, ON, Canada, July 29 - August 2, 2019, pages 66–73. ACM, 2019. doi:10.1145/3293611.3331618.
- Upper and lower time bounds for parallel random access machines without simultaneous writes. SIAM Journal on Computing, 15(1):87–97, 1986. doi:10.1137/0215006.
- "tri, tri again": Finding triangles and small subgraphs in a distributed setting - (extended abstract). In Marcos K. Aguilera, editor, Distributed Computing - 26th International Symposium, DISC 2012, Salvador, Brazil, October 16-18, 2012. Proceedings, volume 7611 of Lecture Notes in Computer Science, pages 195–209. Springer, 2012. doi:10.1007/978-3-642-33651-5_14.
- On the power of preprocessing in decentralized network optimization. In 2019 IEEE Conference on Computer Communications, INFOCOM 2019, Paris, France, April 29 - May 2, 2019, pages 1450–1458. IEEE, 2019. doi:10.1109/INFOCOM.2019.8737382.
- Does preprocessing help under congestion? In Peter Robinson and Faith Ellen, editors, Proceedings of the 2019 ACM Symposium on Principles of Distributed Computing, PODC 2019, Toronto, ON, Canada, July 29 - August 2, 2019, pages 259–261. ACM, 2019. doi:10.1145/3293611.3331581.
- Sparse matrix multiplication in the low-bandwidth model. In Kunal Agrawal and I-Ting Angelina Lee, editors, SPAA ’22: 34th ACM Symposium on Parallelism in Algorithms and Architectures, Philadelphia, PA, USA, July 11 - 14, 2022, pages 435–444. ACM, 2022. doi:10.1145/3490148.3538575.
- Taisuke Izumi and François Le Gall. Triangle finding and listing in CONGEST networks. In Elad Michael Schiller and Alexander A. Schwarzmann, editors, Proceedings of the ACM Symposium on Principles of Distributed Computing, PODC 2017, Washington, DC, USA, July 25-27, 2017, pages 381–389. ACM, 2017. doi:10.1145/3087801.3087811.
- Deterministic subgraph detection in broadcast CONGEST. In James Aspnes, Alysson Bessani, Pascal Felber, and João Leitão, editors, 21st International Conference on Principles of Distributed Systems, OPODIS 2017, Lisbon, Portugal, December 18-20, 2017, volume 95 of LIPIcs, pages 4:1–4:16. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2017. doi:10.4230/LIPICS.OPODIS.2017.4.
- François Le Gall. Further algebraic algorithms in the congested clique model and applications to graph-theoretic problems. In Cyril Gavoille and David Ilcinkas, editors, Distributed Computing - 30th International Symposium, DISC 2016, Paris, France, September 27-29, 2016. Proceedings, volume 9888 of Lecture Notes in Computer Science, pages 57–70. Springer, 2016. doi:10.1007/978-3-662-53426-7_5.
- MST construction in o(log log n) communication rounds. In Arnold L. Rosenberg and Friedhelm Meyer auf der Heide, editors, SPAA 2003: Proceedings of the Fifteenth Annual ACM Symposium on Parallelism in Algorithms and Architectures, June 7-9, 2003, San Diego, California, USA (part of FCRC 2003), pages 94–100. ACM, 2003. doi:10.1145/777412.777428.
- On the distributed complexity of large-scale graph computations. ACM Transactions on Parallel Computing, 8(2):7:1–7:28, 2021. doi:10.1145/3460900.
- Exploiting locality in distributed SDN control. In Nate Foster and Rob Sherwood, editors, Proceedings of the Second ACM SIGCOMM Workshop on Hot Topics in Software Defined Networking, HotSDN 2013, The Chinese University of Hong Kong, Hong Kong, China, Friday, August 16, 2013, pages 121–126. ACM, 2013. doi:10.1145/2491185.2491198.
- Leslie G. Valiant. A bridging model for parallel computation. Communications of the ACM, 33(8):103–111, 1990. doi:10.1145/79173.79181.
- New bounds for matrix multiplication: from alpha to omega. In Proceedings of the 2024 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 3792–3835. Society for Industrial and Applied Mathematics, 2024. doi:10.1137/1.9781611977912.134.