SUBP: Soft Uniform Block Pruning for 1xN Sparse CNNs Multithreading Acceleration (2310.06218v1)
Abstract: The study of sparsity in Convolutional Neural Networks (CNNs) has become widespread to compress and accelerate models in environments with limited resources. By constraining N consecutive weights along the output channel to be group-wise non-zero, the recent network with 1$\times$N sparsity has received tremendous popularity for its three outstanding advantages: 1) A large amount of storage space saving by a \emph{Block Sparse Row} matrix. 2) Excellent performance at a high sparsity. 3) Significant speedups on CPUs with Advanced Vector Extensions. Recent work requires selecting and fine-tuning 1$\times$N sparse weights based on dense pre-trained weights, leading to the problems such as expensive training cost and memory access, sub-optimal model quality, as well as unbalanced workload across threads (different sparsity across output channels). To overcome them, this paper proposes a novel \emph{\textbf{S}oft \textbf{U}niform \textbf{B}lock \textbf{P}runing} (SUBP) approach to train a uniform 1$\times$N sparse structured network from scratch. Specifically, our approach tends to repeatedly allow pruned blocks to regrow to the network based on block angular redundancy and importance sampling in a uniform manner throughout the training process. It not only makes the model less dependent on pre-training, reduces the model redundancy and the risk of pruning the important blocks permanently but also achieves balanced workload. Empirically, on ImageNet, comprehensive experiments across various CNN architectures show that our SUBP consistently outperforms existing 1$\times$N and structured sparsity methods based on pre-trained models or training from scratch. Source codes and models are available at \url{https://github.com/JingyangXiang/SUBP}.
- Mmdetection: Open mmlab detection toolbox and benchmark. arXiv preprint arXiv:1906.07155, 2019.
- Tvm: An automated end-to-end optimizing compiler for deep learning. In Symposium on Operating Systems Design and Implementation (OSDI), pages 578–594, 2018.
- Rgp: Neural network pruning through its regular graph structure. arXiv preprint arXiv:2110.15192, 2021.
- Unsupervised learning of foreground object segmentation. International Journal of Computer Vision (IJCV), 127(9):1279–1302, 2019.
- Imagenet: A large-scale hierarchical image database. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 248–255, 2009.
- Centripetal sgd for pruning very deep convolutional networks with complicated structure. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 4943–4953, 2019.
- Approximated oracle filter pruning for destructive cnn width optimization. In International Conference on Machine Learning (ICML), pages 1607–1616. PMLR, 2019.
- Lossless cnn channel pruning via decoupling remembering and forgetting. Proceedings of the IEEE/CVF International Conference on Com-puter Vision, 2021.
- Resrep: Lossless cnn pruning via decoupling remembering and forgetting. In IEEE International Conference on Computer Vision (ICCV), pages 4510–4520, 2021.
- Global sparse momentum sgd for pruning very deep neural networks. In Advances in Neural Information Processing Systems (NeurIPS), pages 6382–6394, 2019.
- Xuanyi Dong and Yi Yang. Network pruning via transformable architecture search. Proceedings of Advances in Neural Information Processing Systems, pages 759–770, 2019.
- Fast sparse convnets. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 14629–14638, 2020.
- The lottery ticket hypothesis: Finding sparse, trainable neural networks. In International Conference on Learning Representations (ICLR), 2019.
- Network pruning via performance maximization. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9270–9280, 2021.
- Discrete model compression with resource constraint for deep neural networks. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1899–1908, 2020.
- Ross Girshick. Fast r-cnn. In IEEE International Conference on Computer Vision (ICCV), pages 1440–1448, 2015.
- Rich feature hierarchies for accurate object detection and semantic segmentation. In IEEE International Conference on Computer Vision (ICCV), pages 580–587, 2014.
- Dmcp: Differentiable markov channel pruning for neural networks. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1539–1547, 2020.
- Learning both weights and connections for efficient neural network. In Advances in Neural Information Processing Systems (NeurIPS), pages 1135–1143, 2015.
- Second order derivatives for network pruning: Optimal brain surgeon. Advances in neural information processing systems, 5, 1992.
- Mask r-cnn. In IEEE International Conference on Computer Vision (ICCV), 2017.
- Deep residual learning for image recognition. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778, 2016.
- Learning filter pruning criteria for deep convolutional neural networks acceleration. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2009–2018, 2020.
- Soft filter pruning for accelerating deep convolutional neural networks. In International Joint Conference on Artificial Intelligence (IJCAI), pages 2234–2240, 2018.
- Filter pruning via geometric median for deep convolutional neural networks acceleration. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 4340–4349, 2019.
- Amc: Automl for model compression and acceleration on mobile devices. In Proceedings of the European conference on computer vision (ECCV), pages 784–800, 2018.
- Channel pruning for accelerating very deep neural networks. In Proceedings of the IEEE international conference on computer vision, pages 1389–1397, 2017.
- A feature-map discriminant perspective for pruning deep neural networks. arXiv preprint arXiv:2005.13796, 2020.
- Chex: Channel exploration for cnn model compression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12287–12298, 2022.
- Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861, 2017.
- Data-driven sparse structure selection for deep neural networks. Proceedings of the European Conference on Computer Vision, pages 304–320, 2018.
- Operation-aware soft channel pruning using differentiable masks. Proceedings of International Conference on Machine Learning, pages 5122–5131, 2020.
- Optimal brain damage. Advances in neural information processing systems, 2, 1989.
- Snip: Single-shot network pruning based on connection sensitivity. In International Conference on Learning Representations (ICLR), 2019.
- Eagleeye: Fast sub-net evaluation for efficient neural network pruning. Proceedings of European Conference on Computer Vision, pages 639–654, 2020.
- Dynamic slimmable network. Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021.
- Pruning filters for efficient convnets. Proceedings of International Conference on Learning Representations, 2017.
- Group sparsity: The hinge between filter pruning and decomposition for network compression. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8018–8027, 2020.
- Provable filter pruning for efficient neural networks. Proceedings of International Conference on Learning Representations, 2020.
- Hrank: Filter pruning using high-rank feature map. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1529–1538, 2020.
- Channel pruning via automatic structure search. In International Joint Conference on Artificial Intelligence (IJCAI), pages 673–679, 2020.
- 1xn pattern for pruning convolutional neural networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022.
- Towards optimal structured cnn pruning via generative adversarial learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2790–2799, 2019.
- Microsoft coco: Common objects in context. In European Conference on Computer Vision (ECCV), pages 740–755, 2014.
- Metapruning: Meta learning for automatic neural network channel pruning. In IEEE International Conference on Computer Vision (ICCV), pages 3296–3305, 2019.
- Learning efficient convolutional networks through network slimming. In Proceedings of the IEEE international conference on computer vision, pages 2736–2744, 2017.
- Neural network pruning with residual-connections and limited-data. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1458–1467, 2020.
- Thinet: A filter level pruning method for deep neural network compression. In IEEE International Conference on Computer Vision (ICCV), pages 5058–5066, 2017.
- Pruning filter in filter. In H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, volume 33, pages 17629–17640. Curran Associates, Inc., 2020.
- Accelerating sparse deep neural networks. arXiv preprint arXiv:2104.08378, 2021.
- Importance estimation for neural network pruning. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 11264–11272, 2019.
- Dsa: More efficient budgeted pruning via differentiable sparsity allocation. In European Conference on Computer Vision (ECCV), pages 592–607. Springer, 2020.
- Channel permutations for n: M sparsity. Advances in Neural Information Processing Systems (NeurIPS), 34, 2021.
- Faster r-cnn: Towards real-time object detection with region proposal networks. In Advances in neural information processing systems, pages 91–99, 2015.
- Olivier Giroux et al. Ronny Krashinsky. Nvidia ampere sparse tensor core. https://developer.nvidia.com/blog/nvidia-ampere-architecture-in-depth/, 2020.
- Very deep convolutional networks for large-scale image recognition. In International Conference on Learning Representations (ICLR), 2015.
- Locally free weight sharing for network width search. Proceedings of International Conference on Learning Representations, 2021.
- Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1–9, 2015.
- Scop: Scientific control for reliable neural network pruning. Proceedings of Advances in Neural Information Processing Systems, 2020.
- Design principles for sparse matrix multiplication on the gpu. In Euro-Par 2018: Parallel Processing: 24th International Conference on Parallel and Distributed Computing, Turin, Italy, August 27-31, 2018, Proceedings, pages 672–687. Springer, 2018.
- Netadapt: Platform-aware neural network adaptation for mobile applications. In Proceedings of the European Conference on Computer Vision (ECCV), pages 285–300, 2018.
- Good subnetworks provably exist: Pruning via greedy forward selection. Proceedings of International Conference on Machine Learning, pages 10820–10830, 2020.
- Sparsetir: Composable abstractions for sparse compilation in deep learning. In Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3, ASPLOS 2023, page 660–678, New York, NY, USA, 2023. Association for Computing Machinery.
- Carrying out cnn channel pruning in a white box. IEEE Transactions on Neural Networks and Learning Systems (TNNLS), 2022.
- Learning best combination for efficient n: M sparsity. In Advances in Neural Information Processing Systems.
- Learning n: M fine-grained structured sparse neural networks from scratch. In International Conference on Learning Representations (ICLR), 2021.
- To prune, or not to prune: exploring the efficacy of pruning for model compression. In International Conference on Learning Representations Workshop (ICLRW), 2017.
- Neuron-level structured pruning using polarization regularizer. Proceedings of Advances in Neural Information Processing Systems, 2020.
- Discrimination-aware channel pruning for deep neural networks. Proceedings of Advances in Neural Information Processing Systems, pages 883–894, 2018.