Papers
Topics
Authors
Recent
2000 character limit reached

Inducing Semi-Structured Sparsity by Masking for Efficient Model Inference in Convolutional Networks (2411.00288v1)

Published 1 Nov 2024 in cs.LG, cs.AI, cs.CV, cs.NE, and cs.PF

Abstract: The crucial role of convolutional models, both as standalone vision models and backbones in foundation models, necessitates effective acceleration techniques. This paper proposes a novel method to learn semi-structured sparsity patterns for convolution kernels in the form of maskings enabling the utilization of readily available hardware accelerations. The approach accelerates convolutional models more than two-fold during inference without decreasing model performance. At the same time, the original model weights and structure remain unchanged keeping the model thus easily updatable. Beyond the immediate practical use, the effect of maskings on prediction is easily quantifiable. Therefore, guarantees on model predictions under maskings are derived showing stability bounds for learned maskings even after updating the original underlying model.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (47)
  1. PyTorch 2: Faster Machine Learning Through Dynamic Python Bytecode Transformation and Graph Compilation. In Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2, pages 929–947, La Jolla CA USA, Apr. 2024. ACM.
  2. What is the state of neural network pruning? Proceedings of machine learning and systems, 2:129–146, 2020.
  3. Language Models are Few-Shot Learners. arXiv preprint arXiv:2005.14165, July 2020.
  4. A. Buluç and J. R. Gilbert. Parallel Sparse Matrix-Matrix Multiplication and Indexing: Implementation and Experiments. SIAM Journal on Scientific Computing, 34(4):C170–C191, Jan. 2012.
  5. Deep learning in computer vision: A critical review of emerging techniques and application scenarios. Machine Learning with Applications, 6:100134, Dec. 2021.
  6. Randaugment: Practical automated data augmentation with a reduced search space. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pages 702–703, 2020.
  7. Compute and Energy Consumption Trends in Deep Learning Inference. Sustainable Computing: Informatics and Systems, 38:100857, Apr. 2023.
  8. The state of sparsity in deep neural networks. arXiv preprint arXiv:1902.09574, 2019.
  9. X. Glorot and Y. Bengio. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pages 249–256. JMLR Workshop and Conference Proceedings, 2010.
  10. Accelerating Deep Neural Networks via Semi-Structured Activation Sparsity. In 2023 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), pages 1171–1180, Paris, France, Oct. 2023. IEEE.
  11. E. J. Gumbel. Statistical Theory of Extreme Values and Some Practical Applications: A Series of Lectures, volume 33. US Government Printing Office, 1954.
  12. Learning both weights and connections for efficient neural network. Advances in neural information processing systems, 28, 2015.
  13. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 770–778, 2016.
  14. Sparsity in deep learning: Pruning and growth for efficient inference and training in neural networks. Journal of Machine Learning Research, 22(241):1–124, 2021.
  15. S. Hooker. The hardware lottery. Communications of the ACM, 64(12):58–65, Dec. 2021.
  16. LoRA: Low-Rank Adaptation of Large Language Models. arXiv preprint arXiv:2106.09685, Oct. 2021.
  17. Pruning Large Language Models with Semi-Structural Adaptive Sparse Training. arXiv preprint arXiv:2407.20584, Aug. 2024.
  18. Accelerated sparse neural training: A provable and efficient method to find n: M transposable masks. Advances in neural information processing systems, 34:21099–21111, 2021.
  19. How Well Do Sparse ImageNet Models Transfer? In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 12256–12266, New Orleans, LA, USA, June 2022. IEEE.
  20. Categorical Reparameterization with Gumbel-Softmax. arXiv preprint arXiv:1611.01144, Aug. 2017.
  21. Predicting the Computational Cost of Deep Learning Models. In 2018 IEEE International Conference on Big Data (Big Data), pages 3873–3882, Seattle, WA, USA, Dec. 2018. IEEE.
  22. Evaluating the Energy Efficiency of Deep Convolutional Neural Networks on CPUs and GPUs. In 2016 IEEE International Conferences on Big Data and Cloud Computing (BDCloud), Social Computing and Networking (SocialCom), Sustainable Computing and Communications (SustainCom) (BDCloud-SocialCom-SustainCom), pages 477–484, Atlanta, GA, USA, Oct. 2016. IEEE.
  23. Pruning Filters for Efficient ConvNets. arXiv preprint arXiv:1608.08710, Mar. 2017.
  24. Dynamic Model Pruning with Feedback. arXiv preprint arXiv:2006.07253, June 2020.
  25. A ConvNet for the 2020s. arXiv preprint arXiv:2201.03545, Mar. 2022.
  26. Rethinking the Value of Network Pruning. arXiv preprint arXiv:1810.05270, Mar. 2019.
  27. I. Loshchilov and F. Hutter. Decoupled Weight Decay Regularization. arXiv preprint arXiv:1711.05101, Jan. 2019.
  28. Exploring the Granularity of Sparsity in Convolutional Neural Networks. In 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pages 1927–1934, Honolulu, HI, USA, July 2017. IEEE.
  29. Exploring the Regularity of Sparse Structure in Convolutional Neural Networks. arXiv preprint arXiv:1705.08922, June 2017.
  30. G. Menghani. Efficient Deep Learning: A Survey on Making Deep Learning Models Smaller, Faster, and Better. ACM Computing Surveys, 55(12):1–37, Dec. 2023.
  31. Pruning Convolutional Neural Networks for Resource Efficient Inference. arXiv preprint arXiv:1611.06440, June 2017.
  32. Gradient-free structured pruning with unlabeled data. In International Conference on Machine Learning, pages 26326–26341. PMLR, 2023.
  33. NVIDIA Corporation. Apex (A PyTorch Extension) — Apex 0.1.0 documentation. https://nvidia.github.io/apex/, 2018.
  34. NVIDIA Corporation. DALI (nvidia-dali-cudaX.X), 2024.
  35. Deep Learning Inference in Facebook Data Centers: Characterization, Performance Optimizations and Hardware Implications. arXiv preprint arXiv:1811.09886, Nov. 2018.
  36. Accelerating Inference with Sparsity Using the NVIDIA Ampere Architecture and NVIDIA TensorRT. https://developer.nvidia.com/blog/accelerating-inference-with-sparsity-using-ampere-and-tensorrt/, July 2021.
  37. J. Pool and C. Yu. Channel Permutations for N:M Sparsity. Advances in neural information processing systems, 34:13316–13327, 2021.
  38. Learning Transferable Visual Models From Natural Language Supervision. arXiv preprint arXiv:2103.00020, Feb. 2021.
  39. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision, 115(3):211–252, Dec. 2015.
  40. The Computational Limits of Deep Learning. In Ninth Computing within Limits 2023, Virtual, June 2023. LIMITS.
  41. Learning structured sparsity in deep neural networks. Advances in neural information processing systems, 29, 2016.
  42. Drawing Early-Bird Tickets: Towards More Efficient Training of Deep Networks. arXiv preprint arXiv:1909.11957, Feb. 2022.
  43. Cutmix: Regularization strategy to train strong classifiers with localizable features. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 6023–6032, 2019.
  44. R. Yuster and U. Zwick. Fast sparse matrix multiplication. ACM Transactions on Algorithms, 1(1):2–13, July 2005.
  45. H. Zhang. Mixup: Beyond empirical risk minimization. arXiv preprint arXiv:1710.09412, 2017.
  46. Advancing model pruning via bi-level optimization. Advances in Neural Information Processing Systems, 35:18309–18326, 2022.
  47. Random erasing data augmentation. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 13001–13008, 2020.

Summary

We haven't generated a summary for this paper yet.

Whiteboard

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Authors (1)

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 0 likes about this paper.