Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

FLAASH: Flexible Accelerator Architecture for Sparse High-Order Tensor Contraction (2404.16317v1)

Published 25 Apr 2024 in cs.AR and cs.LG

Abstract: Tensors play a vital role in ML and often exhibit properties best explored while maintaining high-order. Efficiently performing ML computations requires taking advantage of sparsity, but generalized hardware support is challenging. This paper introduces FLAASH, a flexible and modular accelerator design for sparse tensor contraction that achieves over 25x speedup for a deep learning workload. Our architecture performs sparse high-order tensor contraction by distributing sparse dot products, or portions thereof, to numerous Sparse Dot Product Engines (SDPEs). Memory structure and job distribution can be customized, and we demonstrate a simple approach as a proof of concept. We address the challenges associated with control flow to navigate data structures, high-order representation, and high-sparsity handling. The effectiveness of our approach is demonstrated through various evaluations, showcasing significant speedup as sparsity and order increase.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (21)
  1. SGD with large step sizes learns sparse features, October 2022.
  2. Hardware Acceleration of Sparse and Irregular Tensor Computations of ML Models: A Survey and Insights. Proceedings of the IEEE, 109(10):1706–1752, October 2021. ISSN 1558-2256. doi: 10.1109/JPROC.2021.3098483.
  3. OpSparse: A Highly Optimized Framework for Sparse General Matrix Multiplication on GPUs, June 2022.
  4. Barrier-Free Large-Scale Sparse Tensor Accelerator (BARISTA) For Convolutional Neural Networks, May 2021.
  5. Extensor: An accelerator for sparse tensor algebra. In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, MICRO ’52, pp.  319–333, New York, NY, USA, 2019. Association for Computing Machinery. ISBN 9781450369381. doi: 10.1145/3352460.3358275. URL https://doi.org/10.1145/3352460.3358275.
  6. The tensor algebra compiler. Proceedings of the ACM on Programming Languages, 1(OOPSLA):77:1–77:29, October 2017. doi: 10.1145/3133901.
  7. Tensor contraction layers for parsimonious deep nets. 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp.  1940–1946, June 2017.
  8. Tensor regression networks. Journal of Machine Learning Research, 21(123):1–21, 2020.
  9. Large Models are Parsimonious Learners: Activation Sparsity in Trained Transformers, October 2022.
  10. Athena: High-performance sparse tensor contraction sequence on heterogeneous memory. In Proceedings of the ACM International Conference on Supercomputing, ICS ’21, pp.  190–202, New York, NY, USA, June 2021a. Association for Computing Machinery. ISBN 978-1-4503-8335-6. doi: 10.1145/3447818.3460355.
  11. Sparta: High-performance, element-wise sparse tensor contraction on heterogeneous memory. In Proceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP ’21, pp.  318–333, New York, NY, USA, February 2021b. Association for Computing Machinery. ISBN 978-1-4503-8294-6. doi: 10.1145/3437801.3441581.
  12. Accelerating Sparse Deep Neural Networks, April 2021.
  13. Outerspace: An outer product based sparse matrix multiplication accelerator. In Proceedings - 24th IEEE International Symposium on High Performance Computer Architecture, HPCA 2018, Proceedings - International Symposium on High-Performance Computer Architecture, pp.  724–736. IEEE Computer Society, March 2018. doi: 10.1109/HPCA.2018.00067. Publisher Copyright: © 2018 IEEE.; 24th IEEE International Symposium on High Performance Computer Architecture, HPCA 2018 ; Conference date: 24-02-2018 Through 28-02-2018.
  14. Extending Sparse Tensor Accelerators to Support Multiple Compression Formats, March 2021.
  15. Movement Pruning: Adaptive Sparsity by Fine-Tuning. In Advances in Neural Information Processing Systems, volume 33, pp.  20378–20389. Curran Associates, Inc., 2020.
  16. Tensaurus: A versatile accelerator for mixed sparse-dense tensor computations. In 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA), pp.  689–702, 2020. doi: 10.1109/HPCA47549.2020.00062.
  17. Supervised Learning with Quantum-Inspired Tensor Networks, May 2017.
  18. Nonparametric Decomposition of Sparse Tensors. In Proceedings of the 38th International Conference on Machine Learning, pp.  10301–10311. PMLR, July 2021.
  19. Dual-side Sparse Tensor Core, May 2021.
  20. SpArch: Efficient Architecture for Sparse Matrix Multiplication. In 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA), pp.  261–274, February 2020. doi: 10.1109/HPCA47549.2020.00030.
  21. To prune, or not to prune: Exploring the efficacy of pruning for model compression. ArXiv, October 2017.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com