SparseAuto: An Auto-Scheduler for Sparse Tensor Computations Using Recursive Loop Nest Restructuring (2311.09549v3)
Abstract: Automated code generation and performance enhancements for sparse tensor algebra have become essential in many real-world applications, such as quantum computing, physical simulations, computational chemistry, and machine learning. General sparse tensor algebra compilers are not always versatile enough to generate asymptotically optimal code for sparse tensor contractions. This paper shows how to generate asymptotically better schedules for complex sparse tensor expressions using kernel fission and fusion. We present generalized loop restructuring transformations to reduce asymptotic time complexity and memory footprint. Furthermore, we present an auto-scheduler that uses a partially ordered set (poset)-based cost model that uses both time and auxiliary memory complexities to prune the search space of schedules. In addition, we highlight the use of Satisfiability Module Theory (SMT) solvers in sparse auto-schedulers to approximate the Pareto frontier of better schedules to the smallest number of possible schedules, with user-defined constraints available at compile-time. Finally, we show that our auto-scheduler can select better-performing schedules and generate code for them. Our results show that the auto-scheduler provided schedules achieve orders-of-magnitude speedup compared to the code generated by the Tensor Algebra Compiler (TACO) for several computations on different real-world tensors.
- High-performance Tensor Contractions for GPUs. Procedia Computer Science 80 (2016), 108–118. https://doi.org/10.1016/j.procs.2016.05.302 International Conference on Computational Science 2016, ICCS 2016, 6-8 June 2016, San Diego, California, USA.
- Autoscheduling for Sparse Tensor Algebra with an Asymptotic Cost Model. In Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation (San Diego, CA, USA) (PLDI 2022). Association for Computing Machinery, New York, NY, USA, 269–285. https://doi.org/10.1145/3519939.3523442
- Memory minimization for tensor contractions using integer linear programming. In Proceedings 20th IEEE International Parallel Distributed Processing Symposium. 8 pp.–. https://doi.org/10.1109/IPDPS.2006.1639717
- Automatic code generation for many-body electronic structure methods: the tensor contraction engine. Molecular Physics 104, 2 (2006), 211–228. https://doi.org/10.1080/00268970500275780 arXiv:https://doi.org/10.1080/00268970500275780
- Compiler Support for Sparse Tensor Computations in MLIR. ACM Trans. Archit. Code Optim. 19, 4, Article 50 (sep 2022), 25 pages. https://doi.org/10.1145/3544559
- Aart J. C. Bik and Harry A. G. Wijshoff. 1993. Compilation Techniques for Sparse Matrix Computations. In Proceedings of the 7th International Conference on Supercomputing (Tokyo, Japan) (ICS ’93). Association for Computing Machinery, New York, NY, USA, 416–424. https://doi.org/10.1145/165939.166023
- Format Abstraction for Sparse Tensor Algebra Compilers. Proc. ACM Program. Lang. 2, OOPSLA, Article 123 (oct 2018), 30 pages. https://doi.org/10.1145/3276493
- Global communication optimization for tensor contraction expressions under memory constraints. In Proceedings International Parallel and Distributed Processing Symposium. 8 pp.–. https://doi.org/10.1109/IPDPS.2003.1213121
- Evaluating Intrusion Detection Systems without Attacking your Friends: The 1998 DARPA Intrusion Detection Evaluation. Proceedings of the 2000 DARPA Information Survivability Conference and Exposition (DISCEX’00) (2000), 12–26. https://doi.org/10.1109/DISCEX.2000.823339
- Timothy A. Davis and Yifan Hu. 2011. The University of Florida Sparse Matrix Collection. ACM Trans. Math. Softw. 38, 1, Article 1 (dec 2011), 25 pages. https://doi.org/10.1145/2049662.2049663
- SparseLNR: accelerating sparse tensor computations using loop nest restructuring. In Proceedings of the 36th ACM International Conference on Supercomputing.
- Johnnie Gray and Stefanos Kourtis. 2021. Hyper-optimized tensor network contraction. Quantum 5 (March 2021), 410. https://doi.org/10.22331/q-2021-03-15-410
- Inductive representation learning on large graphs. Advances in neural information processing systems 30 (2017), 1024–1034.
- Performance Optimization of Tensor Contraction Expressions for Many-Body Methods in Quantum Chemistry. The Journal of Physical Chemistry A 113, 45 (2009), 12715–12723. https://doi.org/10.1021/jp9051215 arXiv:https://doi.org/10.1021/jp9051215 PMID: 19888780.
- So Hirato. [n. d.]. Tensor Contraction Engine: Abstraction an Automated Parallel Implementation of Configuration-Interaction, Coupled-Cluster, and Many-Body Perturbation Theories. The journal of physical chemistry. A 107, 46 ([n. d.]), 9887–9897. https://doi.org/10.1021/jp034596z
- FeatGraph: A Flexible and Efficient Backend for Graph Neural Network Systems. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (Atlanta, Georgia) (SC ’20). IEEE Press, Article 71, 13 pages.
- Raghavendra Kanakagari and Edgar Solomonik. 2023. Minimum Cost Loop Nests for Contraction of a Sparse Tensor with a Tensor Network. https://doi.org/10.48550/arXiv.2307.05740
- A Code Generator for High-Performance Tensor Contractions on GPUs. In 2019 IEEE/ACM International Symposium on Code Generation and Optimization (CGO). 85–95. https://doi.org/10.1109/CGO.2019.8661182
- Tensor Algebra Compilation with Workspaces. none 0 (2019), 180–192. http://dl.acm.org/citation.cfm?id=3314872.3314894
- The Tensor Algebra Compiler. Proc. ACM Program. Lang. 1, OOPSLA, Article 77 (Oct. 2017), 29 pages. https://doi.org/10.1145/3133901
- Tensor Contraction Layers for Parsimonious Deep Nets. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops.
- A Relational Approach to the Compilation of Sparse Matrix Programs. In Proceedings of the Third International Euro-Par Conference on Parallel Processing (Euro-Par ’97). Springer-Verlag, Berlin, Heidelberg, 318–327.
- On Optimizing a Class of Multi-Dimensional Loops with Reductions for Parallel Execution. Parallel Process. Lett. 7 (1997), 157–168. https://api.semanticscholar.org/CorpusID:9440379
- Igor L. an Shi Yaoyun Markov. 2008. Simulating Quantum Computation by Contracting Tensor Networks. SIAM J. Comput. 38, 3 (2008), 963–981. https://doi.org/10.1137/050644756
- Generating Efficient Tensor Contractions for GPUs. In 2015 44th International Conference on Parallel Processing. 969–978. https://doi.org/10.1109/ICPP.2015.106
- FusedMM: A Unified SDDMM-SpMM Kernel for Graph Embedding and Graph Neural Networks. In 2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS). 256–266. https://doi.org/10.1109/IPDPS49936.2021.00034
- Review of tensor network contraction approaches. arXiv preprint arXiv:1708.09213 (2017).
- Tensor Network Contractions: Methods and Applications to Quantum Many-Body Systems. Springer Nature. https://doi.org/10.1007/978-3-030-34489-4
- The Network Data Repository with Interactive Graph Analytics and Visualization. 42, 1 (2015). https://doi.org/10.1145/2740908
- Integrated Loop Optimizations for Data Locality Enhancement of Tensor Contraction Expressions. In SC ’05: Proceedings of the 2005 ACM/IEEE Conference on Supercomputing. 13–13. https://doi.org/10.1109/SC.2005.35
- A Sparse Iteration Space Transformation Framework for Sparse Tensor Algebra. Proc. ACM Program. Lang. 4, OOPSLA, Article 158 (Nov. 2020), 30 pages. https://doi.org/10.1145/3428226
- FROSTT: The Formidable Repository of Open Sparse Tensors and Tools. New York, NY, USA. https://doi.org/10.1145/2049662.2049663
- A High Performance Sparse Tensor Algebra Compiler in MLIR. In 2021 IEEE/ACM 7th Workshop on the LLVM Compiler Infrastructure in HPC (LLVM-HPC). 27–38. https://doi.org/10.1109/LLVMHPC54804.2021.00009
- Ledyard R. Tucker. 1966. Some Mathematical Notes on Three-Mode Factor Analysis. Psychometrika 31, 3 (1966), 279–311. https://doi.org/10.1007/BF02289464
- Loop and Data Transformations for Sparse Matrix Code. In Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation (Portland, OR, USA) (PLDI ’15). Association for Computing Machinery, New York, NY, USA, 521–532. https://doi.org/10.1145/2737924.2738003
- SparseTIR: Composable Abstractions for Sparse Compilation in Deep Learning. arXiv:2207.04606 [cs.LG]
- ReACT: Redundancy-Aware Code Generation for Tensor Expressions. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (Chicago, Illinois) (PACT ’22). Association for Computing Machinery, New York, NY, USA, 1–13. https://doi.org/10.1145/3559009.3569685