Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Abstracting Sparse DNN Acceleration via Structured Sparse Tensor Decomposition (2403.07953v2)

Published 12 Mar 2024 in cs.LG, cs.AI, and cs.AR

Abstract: Exploiting sparsity in deep neural networks (DNNs) has been a promising area to meet the growing computation need of modern DNNs. However, in practice, sparse DNN acceleration still faces a key challenge. To minimize the overhead of sparse acceleration, hardware designers have proposed structured sparse hardware support recently, which provides limited flexibility and requires extra model fine-tuning. Moreover, any sparse model fine-tuned for certain structured sparse hardware cannot be accelerated by other structured hardware. To bridge the gap between sparse DNN models and hardware, this paper proposes tensor approximation via structured decomposition (TASD), which leverages the distributive property in linear algebra to turn any sparse tensor into a series of structured sparse tensors. Next, we develop a software framework, TASDER, to accelerate DNNs by searching layer-wise, high-quality structured decomposition for both weight and activation tensors so that they can be accelerated by any systems with structured sparse hardware support. Evaluation results show that, by exploiting prior structured sparse hardware baselines, our method can accelerate off-the-shelf dense and sparse DNNs without fine-tuning and improves energy-delay-product by up to 83% and 74% on average.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (72)
  1. Pytorch torchvision models. https://pytorch.org/vision/stable/index.html.
  2. Artifact: Sparseloop: An Analytical Approach To Sparse Tensor Accelerator Modeling. Zenodo, October 2022. https://doi.org/10.5281/zenodo.7027215.
  3. Zcomp: Reducing dnn cross-layer memory footprint using vector extensions. In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, MICRO ’52, page 126–138, New York, NY, USA, 2019. Association for Computing Machinery.
  4. Deep speech 2: End-to-end speech recognition in english and mandarin. In International conference on machine learning, pages 173–182. PMLR, 2016.
  5. Coarsening the granularity: Towards structurally sparse lottery tickets. In International conference on machine learning, pages 3025–3039. PMLR, 2022.
  6. Eyeriss v2: A flexible accelerator for emerging deep neural networks on mobile devices. IEEE Journal on Emerging and Selected Topics in Circuits and Systems, 9(2):292–308, 2019.
  7. Optimal fine-grained n: M sparsity for activations and neural gradients. arXiv preprint arXiv:2203.10991, 2022.
  8. Palm: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311, 2022.
  9. 3d u-net: learning dense volumetric segmentation from sparse annotation. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2016: 19th International Conference, Athens, Greece, October 17-21, 2016, Proceedings, Part II 19, pages 424–432. Springer, 2016.
  10. Pixelated butterfly: Simple and efficient sparse training for neural network models. arXiv preprint arXiv:2112.00029, 2021.
  11. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009.
  12. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota, June 2019. Association for Computational Linguistics.
  13. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
  14. An algorithm–hardware co-optimized framework for accelerating n: M sparse transformers. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 30(11):1573–1586, 2022.
  15. Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity. arXiv preprint arXiv:2101.03961, 2021.
  16. Massive language models can be accurately pruned in one-shot. arXiv preprint arXiv:2301.00774, 2023.
  17. Eureka: Efficient tensor cores for one-sided unstructured sparsity in dnn inference. In Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO ’23, page 324–337, New York, NY, USA, 2023. Association for Computing Machinery.
  18. Save: Sparsity-aware vector engine for accelerating dnn training and inference on cpus. In 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pages 796–810, 2020.
  19. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149, 2015.
  20. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
  21. Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415, 2016.
  22. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861, 2017.
  23. Rm-stc: Row-merge dataflow inspired gpu sparse tensor core for energy-efficient sparse acceleration. In Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO ’23, page 338–352, New York, NY, USA, 2023. Association for Computing Machinery.
  24. Cosa: Scheduling by constrained optimization for spatial accelerators. In 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA), pages 554–566, 2021.
  25. Accelerated sparse neural training: A provable and efficient method to find n: m transposable masks. Advances in neural information processing systems, 34:21099–21111, 2021.
  26. Sparsity-aware and re-configurable npu architecture for samsung flagship mobile soc. In 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA), pages 15–28, 2021.
  27. Vegeta: Vertically-integrated extensions for sparse/dense gemm tile acceleration on cpus. In 2023 IEEE International Symposium on High Performance Computer Architecture (HPCA), 2023.
  28. Rasa: Efficient register-aware systolic array matrix engine for cpu. In 2021 58th ACM/IEEE Design Automation Conference (DAC), pages 253–258, 2021.
  29. In-datacenter performance analysis of a tensor processing unit. SIGARCH Comput. Archit. News, 45(2):1–12, jun 2017.
  30. Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems, 25, 2012.
  31. Maestro: A data-centric approach to understand reuse, performance, and hardware cost of dnn mappings. IEEE Micro, 40(3):20–29, 2020.
  32. On emergence of activation sparsity in trained transformers. In 2023 International Conference on Learning Representations (ICLR), 2023.
  33. Large models are parsimonious learners: Activation sparsity in trained transformers. arXiv preprint arXiv:2210.06313, 2022.
  34. Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision, pages 2980–2988, 2017.
  35. Systolic tensor array: An efficient structured-sparse gemm accelerator for mobile cnn inference. IEEE Computer Architecture Letters, 19(1):34–37, 2020.
  36. S2ta: Exploiting structured sparsity for energy-efficient mobile cnn acceleration. arXiv preprint arXiv:2107.07983, 2021.
  37. S2ta: Exploiting structured sparsity for energy-efficient mobile cnn acceleration. In 2022 IEEE International Symposium on High Performance Computer Architecture (HPCA), 2022.
  38. A convnet for the 2020s. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11976–11986, 2022.
  39. Tensordash: Exploiting sparsity to accelerate deep neural network training. In 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pages 781–795, 2020.
  40. Zigzag: Enlarging joint architecture-mapping design space exploration for dnn accelerators. IEEE Transactions on Computers, 70(8):1160–1174, 2021.
  41. Accelerating sparse deep neural networks. arXiv preprint arXiv:2104.08378, 2021.
  42. Lana: Latency-aware network acceleration. In European Conference on Computer Vision, pages 137–156. Springer, 2022.
  43. Block-sparse recurrent neural networks. arXiv preprint arXiv:1711.02782, 2017.
  44. Deep learning recommendation model for personalization and recommendation systems. arXiv preprint arXiv:1906.00091, 2019.
  45. Neuralmagic. Sparsezoo models, 2023. https://sparsezoo.neuralmagic.com/.
  46. NVIDIA. Nvidia ampere ga102 gpu architecture, 2020. https://www.nvidia.com/content/PDF/nvidia-ampere-ga-102-gpu-architecture-whitepaper-v2.1.pdf.
  47. NVIDIA. Nvidia v100 tensor core gpu, 2020. https://images.nvidia.com/content/technologies/volta/pdf/volta-v100-datasheet-update-us-1165301-r5.pdf.
  48. NVIDIA. Nvidia h100 tensor core gpu architecture, 2022. https://resources.nvidia.com/en-us-tensor-core.
  49. Timeloop: A systematic approach to dnn accelerator evaluation. In 2019 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pages 304–315, 2019.
  50. Scnn: An accelerator for compressed-sparse convolutional neural networks. In 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA), pages 27–40, 2017.
  51. Pytorch: An imperative style, high-performance deep learning library. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems 32, pages 8024–8035. Curran Associates, Inc., 2019.
  52. Channel permutations for n:m sparsity. In Advances in Neural Information Processing Systems, volume 34, pages 13316–13327. Curran Associates, Inc., 2021.
  53. Sigma: A sparse and irregular gemm accelerator with flexible interconnects for dnn training. In 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA), pages 58–70, 2020.
  54. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019.
  55. Searching for activation functions. arXiv preprint arXiv:1710.05941, 2017.
  56. Mlperf inference benchmark. In Proceedings of the ACM/IEEE 47th Annual International Symposium on Computer Architecture, ISCA ’20, page 446–459. IEEE Press, 2020.
  57. Compressing dma engine: Leveraging activation sparsity for training deep neural networks. In 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA), pages 78–91, 2018.
  58. A systematic methodology for characterizing scalability of dnn accelerators using scale-sim. In 2020 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pages 58–68, 2020.
  59. Design space exploration of sparse accelerators for deep neural networks. arXiv preprint arXiv:2107.12922, 2021.
  60. Megatron-lm: Training multi-billion parameter language models using model parallelism. arXiv preprint arXiv:1909.08053, 2019.
  61. Primer: Searching for efficient transformers for language modeling. arXiv preprint arXiv:2109.08668, 2021.
  62. Dominosearch: Find layer-wise fine-grained n:m sparse schemes from dense neural networks. In M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan, editors, Advances in Neural Information Processing Systems, volume 34, pages 20721–20732. Curran Associates, Inc., 2021.
  63. Doping: A technique for extreme compression of lstm models using sparse structured additive matrices. Proceedings of machine learning and systems, 3:533–549, 2021.
  64. Dual-side sparse tensor core. In Proceedings of the 48th Annual International Symposium on Computer Architecture, ISCA ’21, page 1083–1095. IEEE Press, 2021.
  65. Highlight: Efficient and flexible dnn acceleration with hierarchical structured sparsity. In Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO ’23, page 1106–1120, New York, NY, USA, 2023. Association for Computing Machinery.
  66. Sparseloop: An analytical approach to sparse tensor accelerator modeling. In 2022 55th IEEE/ACM International Symposium on Microarchitecture (MICRO), pages 1377–1395, 2022.
  67. Training recipe for n:m structured sparsity with decaying pruning mask, 2022.
  68. Sparsetir: Composable abstractions for sparse compilation in deep learning. In Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3, ASPLOS 2023, page 660–678, New York, NY, USA, 2023. Association for Computing Machinery.
  69. Big bird: Transformers for longer sequences. Advances in neural information processing systems, 33:17283–17297, 2020.
  70. Learning best combination for efficient n: M sparsity. Advances in Neural Information Processing Systems, 35:941–953, 2022.
  71. Learning n:m fine-grained structured sparse neural networks from scratch. In International Conference on Learning Representations, 2021.
  72. Sparse tensor core: Algorithm and hardware co-design for vector-wise sparse neural networks on modern gpus. In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, MICRO ’52, page 359–371, New York, NY, USA, 2019. Association for Computing Machinery.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Geonhwa Jeong (12 papers)
  2. Po-An Tsai (10 papers)
  3. Abhimanyu R. Bambhaniya (1 paper)
  4. Stephen W. Keckler (19 papers)
  5. Tushar Krishna (87 papers)
Citations (2)

Summary

We haven't generated a summary for this paper yet.