Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

MaxQ: Multi-Axis Query for N:M Sparsity Network (2312.07061v2)

Published 12 Dec 2023 in cs.CV

Abstract: N:M sparsity has received increasing attention due to its remarkable performance and latency trade-off compared with structured and unstructured sparsity. However, existing N:M sparsity methods do not differentiate the relative importance of weights among blocks and leave important weights underappreciated. Besides, they directly apply N:M sparsity to the whole network, which will cause severe information loss. Thus, they are still sub-optimal. In this paper, we propose an efficient and effective Multi-Axis Query methodology, dubbed as MaxQ, to rectify these problems. During the training, MaxQ employs a dynamic approach to generate soft N:M masks, considering the weight importance across multiple axes. This method enhances the weights with more importance and ensures more effective updates. Meanwhile, a sparsity strategy that gradually increases the percentage of N:M weight blocks is applied, which allows the network to heal from the pruning-induced damage progressively. During the runtime, the N:M soft masks can be precomputed as constants and folded into weights without causing any distortion to the sparse pattern and incurring additional computational overhead. Comprehensive experiments demonstrate that MaxQ achieves consistent improvements across diverse CNN architectures in various computer vision tasks, including image classification, object detection and instance segmentation. For ResNet50 with 1:16 sparse pattern, MaxQ can achieve 74.6\% top-1 accuracy on ImageNet and improve by over 2.8\% over the state-of-the-art. Codes and checkpoints are available at \url{https://github.com/JingyangXiang/MaxQ}.

Enhancing Neural Networks with Multi-Axis Query Sparsity

Introduction to N:M Sparsity

Deep convolutional neural networks (CNNs) have made tremendous strides in various computer vision tasks. Yet, their widespread deployment is often hampered by high memory and computational demands, which are particularly challenging for mobile or edge devices. Network sparsity has emerged as an effective solution to this problem by offering memory and computation savings. Even among the sparse network techniques, the N:M sparsity pattern has drawn increased interest because of its ability to balance performance and latency effectively. N:M sparsity involves keeping only N out of every M consecutive weights within the network, promoting a fine-grained structure of sparsity.

Despite its promise, previous methods for implementing N:M sparsity didn't fully exploit the relative importance of different weights within the neural network blocks, leading to sub-optimal performance.

MaxQ: A Multi-Axis Query Approach

The paper introduces a Multi-Axis Query methodology, named MaxQ, designed to address the limitations of prior N:M sparsity implementations. Unlike earlier methods that considered weights within N:M blocks independently, MaxQ can assess the importance of weights across multiple axes to identify more significant connections within a network.

MaxQ operates dynamically, generating 'soft' N:M masks throughout the training process. These masks highlight and prioritize the alignment of weight updates to the more significant weights, ensuring that crucial connective weights are not undervalued. A particularly innovative aspect of MaxQ is its sparsity strategy: gradually increasing the proportion of weight blocks adhering to the N:M sparsity pattern as the training progresses. This incremental approach lets the network gradually recover from the impact of initial pruning, leading to more stable and efficient training.

During runtime, these soft N:M masks can be precomputed and folded into the weights, posing no additional computational strain nor disrupting the sparse pattern during inference.

Comprehensive Evaluation

The effectiveness of MaxQ was put to the test across different CNN architectures and sparsity patterns. Enhancements were consistent, with substantial improvements particularly notable in heavyweight CNN architectures like ResNet. For example, MaxQ managed to push a 1:16 sparse ResNet50 model to a top-1 accuracy of 74.6% on ImageNet, improving upon the prior best by 2.8%.

Moreover, MaxQ's multi-axis soft masking approach also proved to be beneficial for downstream tasks beyond image classification, such as object detection and instance segmentation, even matching the performance of non-sparse baseline models.

Advantages and Practical Implications

MaxQ is not solely about achieving high compression in neural networks. Importantly, the method's flexibility is underscored as it is applied to varying N:M sparse patterns without significant modifications. Additionally, MaxQ networks can be implemented directly during training without reliance on iterative pre-training or fine-tuning stages that previous methods required, simplifying the process.

Finally, MaxQ shows surprising compatibility with quantization methods, even outperforming some predecessors. When combined with its ability to maintain the N:M sparsity structure, MaxQ represents a leap forward in optimizing networks for deployment on resource-constrained devices.

Conclusion

The Multi-Axis Query technique presents an interpretable, effective, and efficient means of exploiting N:M sparsity within CNNs. Its dynamic query-based mask generation and the progressive adaptation of sparsity throughout training lead to neural networks that retain high performance while meeting stringent resource constraints. MaxQ's demonstration across various computer vision tasks and CNN architectures may set a standard for future sparsity-based network optimizations.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (38)
  1. Learning n:m fine-grained structured sparse neural networks from scratch. In International Conference on Learning Representations, 2021.
  2. Magnitude attention-based dynamic pruning. arXiv preprint arXiv:2306.05056, 2023.
  3. MMDetection: Open mmlab detection toolbox and benchmark. arXiv preprint arXiv:1906.07155, 2019.
  4. Rgp: Neural network pruning through regular graph with edges swapping. IEEE Transactions on Neural Networks and Learning Systems, pages 1–13, 2023.
  5. Make repvgg greater again: A quantization-aware approach. arXiv preprint arXiv:2212.01593, 2022.
  6. Unsupervised learning of foreground object segmentation. International Journal of Computer Vision (IJCV), 127(9):1279–1302, 2019.
  7. Re-parameterizing your optimizers rather than architectures. In The Eleventh International Conference on Learning Representations, 2022.
  8. Fast sparse convnets. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 14629–14638, 2020.
  9. Rigging the lottery: Making all tickets winners. In International Conference on Machine Learning (ICML), pages 2943–2952, 2020.
  10. The lottery ticket hypothesis: Finding sparse, trainable neural networks. In International Conference on Learning Representations, 2019.
  11. Ross Girshick. Fast r-cnn. In IEEE International Conference on Computer Vision (ICCV), pages 1440–1448, 2015.
  12. Rich feature hierarchies for accurate object detection and semantic segmentation. In IEEE International Conference on Computer Vision (ICCV), pages 580–587, 2014.
  13. Dynamic network surgery for efficient dnns. Advances in neural information processing systems, 29, 2016.
  14. Learning both weights and connections for efficient neural network. In Advances in Neural Information Processing Systems. Curran Associates, Inc., 2015.
  15. Deep residual learning for image recognition. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778, 2016.
  16. Mask r-cnn. In IEEE International Conference on Computer Vision (ICCV), 2017a.
  17. Channel pruning for accelerating very deep neural networks. In Proceedings of the IEEE international conference on computer vision, pages 1389–1397, 2017b.
  18. Filter pruning via geometric median for deep convolutional neural networks acceleration. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 4340–4349, 2019a.
  19. Filter pruning via geometric median for deep convolutional neural networks acceleration. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 4335–4344, 2019b.
  20. Chex: Channel exploration for cnn model compression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12287–12298, 2022.
  21. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861, 2017.
  22. Soft masking for cost-constrained channel pruning. In European Conference on Computer Vision, pages 641–657. Springer, 2022.
  23. Soft threshold weight reparameterization for learnable sparsity. In International Conference on Machine Learning, pages 5544–5555. PMLR, 2020.
  24. SNIP: SINGLE-SHOT NETWORK PRUNING BASED ON CONNECTION SENSITIVITY. In International Conference on Learning Representations, 2019.
  25. Hrank: Filter pruning using high-rank feature map. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1529–1538, 2020.
  26. 1xn pattern for pruning convolutional neural networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022.
  27. Microsoft coco: Common objects in context. In European Conference on Computer Vision (ECCV), pages 740–755, 2014.
  28. Learning efficient convolutional networks through network slimming. In Proceedings of the IEEE international conference on computer vision, pages 2736–2744, 2017.
  29. Accelerating sparse deep neural networks. arXiv preprint arXiv:2104.08378, 2021.
  30. Importance estimation for neural network pruning. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 11264–11272, 2019.
  31. Faster r-cnn: Towards real-time object detection with region proposal networks. In Advances in neural information processing systems, pages 91–99, 2015.
  32. Very deep convolutional networks for large-scale image recognition. In International Conference on Learning Representations (ICLR), 2015.
  33. Training data-efficient image transformers & distillation through attention. In International Conference on Machine Learning (ICML), pages 10347–10357, 2021.
  34. Learning structured sparsity in deep neural networks. Advances in neural information processing systems, 29, 2016.
  35. Discovering neural wirings. Advances in Neural Information Processing Systems, 32, 2019.
  36. Optg: Optimizing gradient-driven criteria in network sparsity. arXiv preprint arXiv:2201.12826, 2022a.
  37. Learning best combination for efficient n:m sparsity. In Advances in Neural Information Processing Systems, 2022b.
  38. To prune, or not to prune: exploring the efficacy of pruning for model compression. In International Conference on Learning Representations Workshop (ICLRW), 2017.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Jingyang Xiang (11 papers)
  2. Siqi Li (60 papers)
  3. Junhao Chen (36 papers)
  4. Zhuangzhi Chen (10 papers)
  5. Tianxin Huang (20 papers)
  6. Linpeng Peng (2 papers)
  7. Yong Liu (721 papers)
Github Logo Streamline Icon: https://streamlinehq.com

GitHub