SpComp: A Sparsity Structure-Specific Compilation of Matrix Operations (2307.06109v1)
Abstract: Sparse matrix operations involve a large number of zero operands which makes most of the operations redundant. The amount of redundancy magnifies when a matrix operation repeatedly executes on sparse data. Optimizing matrix operations for sparsity involves either reorganization of data or reorganization of computations, performed either at compile-time or run-time. Although compile-time techniques avert from introducing run-time overhead, their application either is limited to simple sparse matrix operations generating dense output and handling immutable sparse matrices or requires manual intervention to customize the technique to different matrix operations. We contribute a compile time technique called SpComp that optimizes a sparse matrix operation by automatically customizing its computations to the positions of non-zero values of the data. Our approach neither incurs any run-time overhead nor requires any manual intervention. It is also applicable to complex matrix operations generating sparse output and handling mutable sparse matrices. We introduce a data-flow analysis, named Essential Indices Analysis, that statically collects the symbolic information about the computations and helps the code generator to reorganize the computations. The generated code includes piecewise-regular loops, free from indirect references and amenable to further optimization. We see a substantial performance gain by SpComp-generated SpMSpV code when compared against the state-of-the-art TACO compiler and piecewise-regular code generator. On average, we achieve 79% performance gain against TACO and 83% performance gain against the piecewise-regular code generator. When compared against the CHOLMOD library, SpComp generated sparse Cholesky decomposition code showcases 65% performance gain on average.
- 2015. Guide to NumPy 2nd. CreateSpace Independent Publishing Platform, USA. 364 pages.
- Multifrontal Parallel Distributed Symmetric and Unsymmetric Solvers. Comput. Methods Appl. Mech. Eng 184 (1998), 501–520.
- Algorithm 837: AMD, an Approximate Minimum Degree Ordering Algorithm. ACM Trans. Math. Softw. 30, 3 (Sept. 2004), 381–388. https://doi.org/10.1145/1024074.1024081
- Generating piecewise-regular code from irregular structures. In Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’19). 625–639. https://doi.org/10.1145/3314221.3314615
- Utpal K. Banerjee. 1988. Dependence Analysis for Supercomputing. Kluwer Academic Publishers, USA.
- C. Bastoul. 2004. Code generation in the polyhedral model is easier than you think. In Proceedings. 13th International Conference on Parallel Architecture and Compilation Techniques, 2004. PACT 2004. 7–16. https://doi.org/10.1109/PACT.2004.1342537
- Putting Polyhedral Loop Transformations to Work. In Languages and Compilers for Parallel Computing, Lawrence Rauchwerger (Ed.). Springer Berlin Heidelberg, Berlin, Heidelberg, 209–225.
- Ayon Basumallik and Rudolf Eigenmann. 2006. Optimizing Irregular Shared-Memory Applications for Distributed-Memory Systems. In Proceedings of the Eleventh ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (New York, New York, USA) (PPoPP ’06). Association for Computing Machinery, New York, NY, USA, 119–128. https://doi.org/10.1145/1122971.1122990
- Run-Time Parallelization and Scheduling of Loops. In Proceedings of the First Annual ACM Symposium on Parallel Algorithms and Architectures (Santa Fe, New Mexico, USA) (SPAA ’89). Association for Computing Machinery, New York, NY, USA, 303–312. https://doi.org/10.1145/72935.72967
- The Polyhedral Model Is More Widely Applicable Than You Think. In Compiler Construction, Rajiv Gupta (Ed.). Springer Berlin Heidelberg, Berlin, Heidelberg, 283–303.
- The Sparse Compiler MT1: A Reference Guide. Citeseer.
- Aart JC Bik and Harry AG Wijshoff. 1993. Compilation techniques for sparse matrix computations. In Proceedings of the 7th international conference on Supercomputing. 416–424.
- The Automatic Generation of Sparse Primitives. ACM Trans. Math. Softw. 24, 2 (June 1998), 190–225. https://doi.org/10.1145/290200.287636
- Reshaping Access Patterns for Generating Sparse Codes. In Proceedings of the 7th International Workshop on Languages and Compilers for Parallel Computing (LCPC ’94). Springer-Verlag, Berlin, Heidelberg, 406–420.
- Aart J. C. Bik and Harry A. G. Wijshoff. 1994a. Nonzero Structure Analysis. In Proceedings of the 8th International Conference on Supercomputing (Manchester, England) (ICS ’94). Association for Computing Machinery, New York, NY, USA, 226–235. https://doi.org/10.1145/181181.181538
- A. J. C. Bik and H. A. G. Wijshoff. 1996. Automatic data structure selection and transformation for sparse matrix computations. IEEE Transactions on Parallel and Distributed Systems 7, 2 (Feb 1996), 109–126. https://doi.org/10.1109/71.485501
- Aart J. C. Bik and Harry G. Wijshoff. 1994b. On automatic data structure selection and code generation for sparse computations. In Languages and Compilers for Parallel Computing, Utpal Banerjee, David Gelernter, Alex Nicolau, and David Padua (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 57–75.
- Autotuning Sparse Matrix-Vector Multiplication for Multicore.
- Optimizing Sparse Matrix-Vector Multiplication on Emerging Many-Core Architectures. arXiv:1805.11938 [cs.MS]
- An adaptive LU factorization algorithm for parallel circuit simulation. In 17th Asia and South Pacific Design Automation Conference. 359–364.
- Algorithm 887: CHOLMOD, Supernodal Sparse Cholesky Factorization and Update/Downdate. ACM Trans. Math. Software 35, 3 (2008), 1–14. http://dx.doi.org/10.1145/1391989.1391995
- Vectorizing sparse matrix computations with partially-strided codelets. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis. 1–15.
- Sympiler: Transforming Sparse Matrix Codes by Decoupling Symbolic Analysis. (05 2017).
- ParSy: Inspection and Transformation of Sparse Matrix Computations for Parallelism. In SC18: International Conference for High Performance Computing, Networking, Storage and Analysis. 779–793. https://doi.org/10.1109/SC.2018.00065
- ParSy: Inspection and Transformation of Sparse Matrix Computations for Parallelism. In Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis (Dallas, Texas) (SC ’18). IEEE Press, Piscataway, NJ, USA, Article 62, 15 pages. https://doi.org/10.1109/SC.2018.00065
- Ching-Hsien Hsu. 2002. Optimization of sparse matrix redistribution on multicomputers. In Proceedings. International Conference on Parallel Processing Workshop. 615–622. https://doi.org/10.1109/ICPPW.2002.1039784
- Tim Davis. 2023a. SuiteSparse Matrix Collection. https://sparse.tamu.edu/.
- Timothy A. Davis. 2006. Direct Methods for Sparse Linear Systems. Society for Industrial and Applied Mathematics, Philadelphia, PA, USA.
- Timothy A. Davis. 2023b. SuiteSparse : a suite of sparse matrix software. http://faculty.cse.tamu.edu/davis/suitesparse.html.
- Algorithm 836: COLAMD, a Column Approximate Minimum Degree Ordering Algorithm. ACM Trans. Math. Softw. 30, 3 (Sept. 2004), 377–380. https://doi.org/10.1145/1024074.1024080
- T. A. Davis and E. Palamadai Natarajan. 2010. Algorithm 907: KLU, A Direct Sparse Solver for Circuit Simulation Problems. ACM Trans. Math. Software 37, 3 (Sept. 2010), 36:1–36:17. http://dx.doi.org/10.1145/1824801.1824814
- A survey of direct methods for sparse linear systems. Acta Numerica 25 (2016), 383–566. https://doi.org/10.1017/S0962492916000076
- A Supernodal Approach to Sparse Partial Pivoting. Technical Report. USA.
- A Sparse Matrix Library in C++ for High Performance Architectures. Proceedings of the Second Object Oriented Numerics Conference (05 1997).
- Victor Eijkhout. 1992. LAPACK Working Note 50: Distributed Sparse Data Structures for Linear Algebra Operations. Technical Report. Knoxville, TN, USA.
- Hardware implementation of LU decomposition using dataflow architecture on FPGA. In 2013 5th International Conference on Computer Science and Information Technology. 298–302.
- A High Memory Bandwidth FPGA Accelerator for Sparse Matrix-Vector Multiplication. In 2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines. 36–43.
- Parallel sparse LU decomposition using FPGA with an efficient cache architecture. In 2017 IEEE 12th International Conference on ASIC (ASICON). 259–262. https://doi.org/10.1109/ASICON.2017.8252462
- Alan George and Wai-Hung Liu. 1975. A Note on Fill for Sparse Matrices. SIAM J. Numer. Anal. 12, 3 (1975), 452–455. http://www.jstor.org/stable/2156057
- Accelerating SpMV on FPGAs by Compressing Nonzero Values. In 2015 IEEE 23rd Annual International Symposium on Field-Programmable Custom Computing Machines. 64–67.
- Eigen v3. http://eigen.tuxfamily.org.
- High Performance Sparse LU Solver FPGA Accelerator Using a Static Synchronous Data Flow Model. In 2015 IEEE 23rd Annual International Symposium on Field-Programmable Custom Computing Machines. 29–29.
- Rawn Tristan Henry. 2020. A framework for computing on sparse tensors based on operator properties. Ph. D. Dissertation. USA.
- PaStiX: A High-Performance Parallel Direct Solver for Sparse Symmetric Definite Systems. Parallel Comput. 28 (02 2002), 301–321. https://doi.org/10.1016/S0167-8191(01)00141-7
- Intel. 2023. Intel Math Kernel Library. https://software.intel.com/en-us/mkl.
- FPGA acceleration of Sparse Matrix-Vector Multiplication based on Network-on-Chip. In 2011 19th European Signal Processing Conference. 744–748.
- Automatic parallelization of the conjugate gradient algorithm. In Languages and Compilers for Parallel Computing. Springer Berlin Heidelberg, Berlin, Heidelberg, 480–499.
- Optimization by Runtime Specialization for Sparse Matrix-vector Multiplication. SIGPLAN Not. 50, 3 (Sept. 2014), 93–102. https://doi.org/10.1145/2775053.2658773
- Nachiket Kapre and Andre Dehon. 2009. Parallelizing sparse Matrix Solve for SPICE circuit simulation using FPGAs. In In: Proc. Field-Programmable Technology. 190–198.
- Ken Kennedy and John R. Allen. 2001. Optimizing Compilers for Modern Architectures: A Dependence-Based Approach. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA.
- Fredrik Kjolstad and Saman Amarasinghe. 2023. TACO: The Tensor Algebra Compiler. http://tensor-compiler.org/.
- Taco: A tool to generate tensor algebra kernels. In 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE). 943–948. https://doi.org/10.1109/ASE.2017.8115709
- Vladimir Kotlyar. 1999. Relational Algebraic Techniques for the Synthesis of Sparse Matrix Programs. Ph. D. Dissertation. USA. Advisor(s) Pingali, Keshav. AAI9910244.
- Vladimir Kotlyar and Keshav Pingali. 1997. Sparse Code Generation for Imperfectly Nested Loops with Dependences. In Proceedings of the 11th International Conference on Supercomputing (Vienna, Austria) (ICS ’97). Association for Computing Machinery, New York, NY, USA, 188–195. https://doi.org/10.1145/263580.263630
- Compiling Parallel Code for Sparse Matrix Applications. In Proceedings of the 1997 ACM/IEEE Conference on Supercomputing (San Jose, CA) (SC ’97). Association for Computing Machinery, New York, NY, USA, 1–18. https://doi.org/10.1145/509593.509603
- A Relational Approach to the Compilation of Sparse Matrix Programs. Technical Report. USA.
- Seyong Lee and Rudolf Eigenmann. 2008. Adaptive Runtime Tuning of Parallel Sparse Matrix-vector Multiplication on Distributed Memory Systems. In Proceedings of the 22Nd Annual International Conference on Supercomputing (Island of Kos, Greece) (ICS ’08). ACM, New York, NY, USA, 195–204. https://doi.org/10.1145/1375527.1375558
- Automatic tuning of sparse matrix-vector multiplication on multicore clusters. Science China Information Sciences 58, 9 (01 Sep 2015), 1–14. https://doi.org/10.1007/s11432-014-5254-x
- Next-Generation Generic Programming and Its Application to Sparse Matrix Computations. In Proceedings of the 14th International Conference on Supercomputing (Santa Fe, New Mexico, USA) (ICS ’00). Association for Computing Machinery, New York, NY, USA, 88–99. https://doi.org/10.1145/335231.335240
- Principles of run-time support for parallel processors. Proceedings of the 1988 ACM International Conference on Supercomputing (July 1988), 140–152.
- Sparse Matrix Code Dependence Analysis Simplification at Compile Time. CoRR abs/1807.10852 (2018). arXiv:1807.10852 http://arxiv.org/abs/1807.10852
- Sparse Computation Data Dependence Simplification for Efficient Compiler-generated Inspectors. In Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation (Phoenix, AZ, USA) (PLDI 2019). ACM, New York, NY, USA, 594–609. https://doi.org/10.1145/3314221.3314646
- Andres Møller and Michael I. Schwartzbach. 2015. Static Program Analysis. Department of Computer Science, Aarhus University.
- Abstractions for specifying sparse matrix data transformations. In Proc. 8th Int. Workshop Polyhedral Compilation Techn. (IMPACT). 1–10.
- T. Nechma and M. Zwolinski. 2015. Parallel Sparse Matrix Solution for Circuit Simulation on FPGAs. IEEE Trans. Comput. 64, 4 (2015), 1090–1103.
- Michael Norrish and Michelle Mills Strout. 2015. An Approach for Proving the Correctness of Inspector/Executor Transformations. In Languages and Compilers for Parallel Computing, James Brodman and Peng Tu (Eds.). Springer International Publishing, Cham, 131–145.
- A sparse matrix vector multiply accelerator for support vector machine. In 2015 International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES). 109–116.
- NVIDIA. 2023. CUSPARSE. https://developer.nvidia.com/cusparse.
- Sparsifying Synchronization for High-Performance Shared-Memory Sparse Triangular Solver. In Supercomputing, Julian Martin Kunkel, Thomas Ludwig, and Hans Werner Meuer (Eds.). Springer International Publishing, Cham, 124–140.
- PaStiX. 2023. PaStiX. http://pastix.gforge.inria.fr/files/README-txt.html.
- Runtime Compilation Techniques for Data Partitioning and Communication Schedule Reuse. In Proceedings of the 1993 ACM/IEEE Conference on Supercomputing (Portland, Oregon, USA) (Supercomputing ’93). Association for Computing Machinery, New York, NY, USA, 361–370. https://doi.org/10.1145/169627.169752
- Louis-Noël Pouchet and Gabriel Rodríguez. 2018. Polyhedral modeling of immutable sparse matrices. (2018).
- SparseLib++ Sparse Matrix Class Library. https://math.nist.gov/sparselib++/.
- Lawrence Rauchwerger. 1998. Run-time parallelization: Its time has come. Parallel Comput. 24, 3 (1998), 527–556. https://doi.org/10.1016/S0167-8191(98)00024-6
- Distributed Memory Code Generation for Mixed Irregular/Regular Computations. In Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (San Francisco, CA, USA) (PPoPP 2015). Association for Computing Machinery, New York, NY, USA, 65–75. https://doi.org/10.1145/2688500.2688515
- Gabriel Rodríguez. 2022. poly-spmv. https://gitlab.com/grodriguez.udc/poly-spmv.
- Youcef Saad. 1994. SPARSKIT: a basic tool kit for sparse matrix computations - Version 2.
- J. Saltz and R. Mirchandaney. 1991. The preprocessed doacross loop. Proceedings of the Int. Conf. Parallel Process(ICPP) 2 (August 1991), 174–179.
- Olaf Schenk and Klaus Gärtner. 2004. Solving Unsymmetric Sparse Systems of Linear Equations with PARDISO. Future Gener. Comput. Syst. 20, 3 (April 2004), 475–487. https://doi.org/10.1016/j.future.2003.07.011
- Efficient Sparse LU Factorization with Left-Right Looking Strategy on Shared Memory Multiprocessors. BIT Numerical Mathematics 40 (2000), 158–176.
- Paul Vinson Stodghill. 1997. A Relational Approach to the Automatic Generation of Sequential Sparse Matrix Codes. Ph. D. Dissertation. USA.
- The Sparse Polyhedral Framework: Composing Compiler-Generated Inspector-Executor Code. Proc. IEEE 106 (Nov 2018), 1921–1934. https://doi.org/10.1109/JPROC.2018.2857721
- An approach for code generation in the Sparse Polyhedral Framework. Parallel Comput. 53 (2016), 32–57. https://doi.org/10.1016/j.parco.2016.02.004
- EGGS: Sparsity‐Specific Code Generation. Computer Graphics Forum 39 (08 2020), 209–219. https://doi.org/10.1111/cgf.14080
- Collecting Performance Data with PAPI-C. https://icl.utk.edu/papi/.
- Run-time techniques for parallelizing sparse matrix problems. In Parallel Algorithms for Irregularly Structured Problems, Afonso Ferreira and José Rolim (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 43–57.
- Loop and Data Transformations for Sparse Matrix Code. In Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation (Portland, OR, USA) (PLDI ’15). ACM, New York, NY, USA, 521–532. https://doi.org/10.1145/2737924.2738003
- Automating Wavefront Parallelization for Sparse Matrix Computations. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (Salt Lake City, Utah) (SC ’16). IEEE Press, Piscataway, NJ, USA, Article 41, 12 pages. http://dl.acm.org/citation.cfm?id=3014904.3014959
- Non-Affine Extensions to Polyhedral Code Generation. In Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization (Orlando, FL, USA) (CGO ’14). Association for Computing Machinery, New York, NY, USA, 185–194. https://doi.org/10.1145/2581122.2544141
- Piotr Wendykier and James G. Nagy. 2010. Parallel Colt: A High-Performance Java Library for Scientific Computing and Image Processing. TOMS 37 (September 2010). https://doi.org/10.1145/1824801.1824809
- FPGA Accelerated Parallel Sparse Matrix Factorization for Circuit Simulations. In Reconfigurable Computing: Architectures, Tools and Applications, Andreas Koch, Ram Krishnamurthy, John McAllister, Roger Woods, and Tarek El-Ghazawi (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 302–315.
- IA-SpGEMM: An Input-aware Auto-tuning Framework for Parallel Sparse Matrix-matrix Multiplication. In Proceedings of the ACM International Conference on Supercomputing (Phoenix, Arizona) (ICS ’19). ACM, New York, NY, USA, 94–105. https://doi.org/10.1145/3330345.3330354
- Exploiting Parallelism with Dependence-Aware Scheduling. In 2009 18th International Conference on Parallel Architectures and Compilation Techniques. 193–202. https://doi.org/10.1109/PACT.2009.10
- Run-time optimization of sparse matrix-vector multiplication on SIMD machines. In PARLE’94 Parallel Architectures and Languages Europe, Costas Halatsis, Dimitrios Maritsas, George Philokyprou, and Sergios Theodoridis (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 313–322.
- Barnali Basak (1 paper)
- Uday P. Khedker (13 papers)
- Supratim Biswas (2 papers)