Diagonally-Addressed Matrix Nicknack: How to improve SpMV performance (2307.06305v1)
Abstract: We suggest a technique to reduce the storage size of sparse matrices at no loss of information. We call this technique Diagonally-Adressed (DA) storage. It exploits the typically low matrix bandwidth of matrices arising in applications. For memory-bound algorithms, this traffic reduction has direct benefits for both uni-precision and multi-precision algorithms. In particular, we demonstrate how to apply DA storage to the Compressed Sparse Rows (CSR) format and compare the performance in computing the Sparse Matrix Vector (SpMV) product, which is a basic building block of many iterative algorithms. We investigate 1367 matrices from the SuiteSparse Matrix Collection fitting into the CSR format using signed 32 bit indices. More than 95% of these matrices fit into the DA-CSR format using 16 bit column indices, potentially after Reverse Cuthill-McKee (RCM) reordering. Using IEEE 754 double precision scalars, we observe a performance uplift of 11% (single-threaded) or 17.5% (multithreaded) on average when the traffic exceeds the size of the last-level CPU cache. The predicted uplift in this scenario is 20%. For traffic within the CPU's combined level 2 and level 3 caches, the multithreaded performance uplift is over 40% for a few test matrices.
- A survey of numerical linear algebra methods utilizing mixed-precision arithmetic. The International Journal of High Performance Computing Applications, 35(4):344–369, 2021.
- E. Cuthill and J. McKee. Reducing the bandwidth of sparse symmetric matrices. In Proceedings of the 1969 24th national conference (ACM’69), New York, New York, USA, 1969. ACM Press. DOI 10.1145/800195.805928.
- S. Danisch and J. Krumbiegel. Makie.jl: Flexible high-performance data visualization for Julia. Journal of Open Source Software, 6(65):3349, 2021. DOI 10.21105/joss.03349.
- T. A. Davis and Y. Hu. The University of Florida sparse matrix collection. ACM Transactions on Mathematical Software (TOMS), 38(1):1–25, 2011.
- Understanding the Performance of Sparse Matrix-Vector Multiplication. In 16th Euromicro Conference on Parallel, Distributed and Network-Based Processing (PDP 2008). IEEE, 2008. DOI 10.1109/pdp.2008.41.
- Intel. Math Kernel Library v2021.1, 2020. https://en.wikipedia.org/wiki/Math_Kernel_Library.
- Sparse Matrix-Vector Product, pages 103–121. Springer International Publishing, Cham, 2014. DOI 10.1007/978-3-319-06548-9_6.
- M. Leitner-Ankerl. ankerl::nanobench, 2022. https://github.com/martinus/nanobench.
- Efficient sparse matrix-vector multiplication on x86-based many-core processors. In the 27th international ACM conference, New York, New York, USA, 2013. ACM Press. DOI 10.1145/2464996.2465013.
- Utilizing Recursive Storage in Sparse Matrix-Vector Multiplication - Preliminary Considerations. In T. Philips, editor, Proceedings of the ISCA 25th International Conference on Computers and Their Applications, CATA 2010, March 24-26, 2010, Sheraton Waikiki Hotel, Honolulu, Hawaii, USA, pages 300–305. ISCA, 2010.
- T. Tantau. The TikZ and PGF Packages, 2020. Manual for version 3.1.5b. https://github.com/pgf-tikz/pgf.