MaxQ: Multi-Axis Query for N:M Sparsity Network (2312.07061v2)
Abstract: N:M sparsity has received increasing attention due to its remarkable performance and latency trade-off compared with structured and unstructured sparsity. However, existing N:M sparsity methods do not differentiate the relative importance of weights among blocks and leave important weights underappreciated. Besides, they directly apply N:M sparsity to the whole network, which will cause severe information loss. Thus, they are still sub-optimal. In this paper, we propose an efficient and effective Multi-Axis Query methodology, dubbed as MaxQ, to rectify these problems. During the training, MaxQ employs a dynamic approach to generate soft N:M masks, considering the weight importance across multiple axes. This method enhances the weights with more importance and ensures more effective updates. Meanwhile, a sparsity strategy that gradually increases the percentage of N:M weight blocks is applied, which allows the network to heal from the pruning-induced damage progressively. During the runtime, the N:M soft masks can be precomputed as constants and folded into weights without causing any distortion to the sparse pattern and incurring additional computational overhead. Comprehensive experiments demonstrate that MaxQ achieves consistent improvements across diverse CNN architectures in various computer vision tasks, including image classification, object detection and instance segmentation. For ResNet50 with 1:16 sparse pattern, MaxQ can achieve 74.6\% top-1 accuracy on ImageNet and improve by over 2.8\% over the state-of-the-art. Codes and checkpoints are available at \url{https://github.com/JingyangXiang/MaxQ}.
- Learning n:m fine-grained structured sparse neural networks from scratch. In International Conference on Learning Representations, 2021.
- Magnitude attention-based dynamic pruning. arXiv preprint arXiv:2306.05056, 2023.
- MMDetection: Open mmlab detection toolbox and benchmark. arXiv preprint arXiv:1906.07155, 2019.
- Rgp: Neural network pruning through regular graph with edges swapping. IEEE Transactions on Neural Networks and Learning Systems, pages 1–13, 2023.
- Make repvgg greater again: A quantization-aware approach. arXiv preprint arXiv:2212.01593, 2022.
- Unsupervised learning of foreground object segmentation. International Journal of Computer Vision (IJCV), 127(9):1279–1302, 2019.
- Re-parameterizing your optimizers rather than architectures. In The Eleventh International Conference on Learning Representations, 2022.
- Fast sparse convnets. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 14629–14638, 2020.
- Rigging the lottery: Making all tickets winners. In International Conference on Machine Learning (ICML), pages 2943–2952, 2020.
- The lottery ticket hypothesis: Finding sparse, trainable neural networks. In International Conference on Learning Representations, 2019.
- Ross Girshick. Fast r-cnn. In IEEE International Conference on Computer Vision (ICCV), pages 1440–1448, 2015.
- Rich feature hierarchies for accurate object detection and semantic segmentation. In IEEE International Conference on Computer Vision (ICCV), pages 580–587, 2014.
- Dynamic network surgery for efficient dnns. Advances in neural information processing systems, 29, 2016.
- Learning both weights and connections for efficient neural network. In Advances in Neural Information Processing Systems. Curran Associates, Inc., 2015.
- Deep residual learning for image recognition. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778, 2016.
- Mask r-cnn. In IEEE International Conference on Computer Vision (ICCV), 2017a.
- Channel pruning for accelerating very deep neural networks. In Proceedings of the IEEE international conference on computer vision, pages 1389–1397, 2017b.
- Filter pruning via geometric median for deep convolutional neural networks acceleration. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 4340–4349, 2019a.
- Filter pruning via geometric median for deep convolutional neural networks acceleration. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 4335–4344, 2019b.
- Chex: Channel exploration for cnn model compression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12287–12298, 2022.
- Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861, 2017.
- Soft masking for cost-constrained channel pruning. In European Conference on Computer Vision, pages 641–657. Springer, 2022.
- Soft threshold weight reparameterization for learnable sparsity. In International Conference on Machine Learning, pages 5544–5555. PMLR, 2020.
- SNIP: SINGLE-SHOT NETWORK PRUNING BASED ON CONNECTION SENSITIVITY. In International Conference on Learning Representations, 2019.
- Hrank: Filter pruning using high-rank feature map. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1529–1538, 2020.
- 1xn pattern for pruning convolutional neural networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022.
- Microsoft coco: Common objects in context. In European Conference on Computer Vision (ECCV), pages 740–755, 2014.
- Learning efficient convolutional networks through network slimming. In Proceedings of the IEEE international conference on computer vision, pages 2736–2744, 2017.
- Accelerating sparse deep neural networks. arXiv preprint arXiv:2104.08378, 2021.
- Importance estimation for neural network pruning. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 11264–11272, 2019.
- Faster r-cnn: Towards real-time object detection with region proposal networks. In Advances in neural information processing systems, pages 91–99, 2015.
- Very deep convolutional networks for large-scale image recognition. In International Conference on Learning Representations (ICLR), 2015.
- Training data-efficient image transformers & distillation through attention. In International Conference on Machine Learning (ICML), pages 10347–10357, 2021.
- Learning structured sparsity in deep neural networks. Advances in neural information processing systems, 29, 2016.
- Discovering neural wirings. Advances in Neural Information Processing Systems, 32, 2019.
- Optg: Optimizing gradient-driven criteria in network sparsity. arXiv preprint arXiv:2201.12826, 2022a.
- Learning best combination for efficient n:m sparsity. In Advances in Neural Information Processing Systems, 2022b.
- To prune, or not to prune: exploring the efficacy of pruning for model compression. In International Conference on Learning Representations Workshop (ICLRW), 2017.
- Jingyang Xiang (11 papers)
- Siqi Li (60 papers)
- Junhao Chen (36 papers)
- Zhuangzhi Chen (10 papers)
- Tianxin Huang (20 papers)
- Linpeng Peng (2 papers)
- Yong Liu (721 papers)