Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Separate, Dynamic and Differentiable (SMART) Pruner for Block/Output Channel Pruning on Computer Vision Tasks (2403.19969v2)

Published 29 Mar 2024 in cs.CV and cs.LG

Abstract: Block pruning, which eliminates contiguous blocks of weights, is a structural pruning method that can significantly enhance the performance of neural processing units (NPUs). In industrial applications, an ideal block pruning algorithm should meet three key requirements: (1) maintain high accuracy across diverse models and tasks, as machine learning deployments on edge devices are typically accuracy-critical; (2) offer precise control over resource constraints to facilitate user adoption; and (3) provide convergence guarantees to prevent performance instability. However, to the best of our knowledge, no existing block pruning algorithm satisfies all three requirements simultaneously. In this paper, we introduce SMART (Separate, Dynamic, and Differentiable) pruning, a novel algorithm designed to address this gap. SMART leverages both weight and activation information to enhance accuracy, employs a differentiable top-k operator for precise control of resource constraints, and offers convergence guarantees under mild conditions. Extensive experiments involving seven models, four datasets, three different block types, and three computer vision tasks demonstrate that SMART pruning achieves state-of-the-art performance in block pruning.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (46)
  1. Ahle, T. Differentiable Top-k function. https://math.stackexchange.com/questions/3280757/differentiable-top-k-function, 2023. Accessed: 2024-01-12.
  2. MobileOne: An improved one millisecond mobile backbone. arXiv e-prints, pp.  arXiv–2206, 2022.
  3. Dkm: Differentiable k-means clustering layer for neural network compression. arXiv preprint arXiv:2108.12659, 2021.
  4. PDP: Parameter-free Differentiable Pruning is all you need. arXiv preprint arXiv:2305.11203, 2023.
  5. The cityscapes dataset. In CVPR Workshop on the Future of Datasets in Vision, volume 2. sn, 2015.
  6. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pp.  248–255. IEEE, 2009.
  7. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
  8. A review of sparse expert models in deep learning. arXiv preprint arXiv:2209.01667, 2022.
  9. Disentangled differentiable network pruning. In European Conference on Computer Vision, pp.  328–345. Springer, 2022.
  10. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149, 2015a.
  11. Learning both weights and connections for efficient neural network. Advances in neural information processing systems, 28, 2015b.
  12. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp.  770–778, 2016.
  13. Channel pruning for accelerating very deep neural networks. In Proceedings of the IEEE international conference on computer vision, pp.  1389–1397, 2017.
  14. AMC: Automl for model compression and acceleration on mobile devices. In Proceedings of the European conference on computer vision (ECCV), pp.  784–800, 2018.
  15. MobileNets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861, 2017.
  16. Jocher, G. YOLOv5 by Ultralytics, May 2020. URL https://github.com/ultralytics/yolov5.
  17. Soft threshold weight reparameterization for learnable sparsity. In International Conference on Machine Learning, pp.  5544–5555. PMLR, 2020.
  18. Block pruning for faster transformers. arXiv preprint arXiv:2109.04838, 2021.
  19. Network quantization with element-wise gradient scaling. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.  6448–6457, 2021.
  20. Additive powers-of-two quantization: An efficient non-uniform discretization for neural networks. arXiv preprint arXiv:1909.13144, 2019.
  21. Pruning-as-search: Efficient neural architecture search via channel pruning and structural reparameterization. arXiv preprint arXiv:2206.01198, 2022.
  22. Channel pruning via automatic structure search. arXiv preprint arXiv:2001.08565, 2020.
  23. Microsoft coco: Common objects in context. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13, pp.  740–755. Springer, 2014.
  24. Sparse training via boosting pruning plasticity with neuroregeneration. Advances in Neural Information Processing Systems, 34:9908–9922, 2021a.
  25. Swin transformer: Hierarchical vision Transformer using shifted windows. In Proceedings of the IEEE/CVF international conference on computer vision, pp.  10012–10022, 2021b.
  26. MobileVIT: light-weight, general-purpose, and mobile-friendly vision Transformer. arXiv preprint arXiv:2110.02178, 2021.
  27. Ac/dc: Alternating compressed/decompressed training of deep neural networks. Advances in neural information processing systems, 34:8557–8570, 2021.
  28. Model compression via distillation and quantization. arXiv preprint arXiv:1802.05668, 2018.
  29. Reed, R. Pruning algorithms-a survey. IEEE transactions on Neural Networks, 4(5):740–747, 1993.
  30. Fast, differentiable and sparse Top-k: a convex analysis perspective. In International Conference on Machine Learning, pp.  29919–29936. PMLR, 2023.
  31. MobileNetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp.  4510–4520, 2018.
  32. Movement pruning: Adaptive sparsity by fine-tuning. Advances in Neural Information Processing Systems, 33:20378–20389, 2020.
  33. Outrageously large neural networks: The sparsely-gated mixture-of-experts layer. arXiv preprint arXiv:1701.06538, 2017.
  34. A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play. Science, 362(6419):1140–1144, 2018.
  35. Dominosearch: Find layer-wise fine-grained N:M sparse schemes from dense neural networks. Advances in neural information processing systems, 34:20721–20732, 2021.
  36. Unlu, H. Efficient neural network deployment for microcontroller. arXiv preprint arXiv:2007.01348, 2020.
  37. HTVM: Efficient neural network deployment on heterogeneous TinyML platforms. In 2023 60th ACM/IEEE Design Automation Conference (DAC), pp.  1–6. IEEE, 2023.
  38. Haq: Hardware-aware automated quantization with mixed precision. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.  8612–8620, 2019.
  39. Discovering neural wirings. Advances in Neural Information Processing Systems, 32, 2019.
  40. Deep k-means: Re-training and parameter sharing with harder cluster assignments for compressing deep convolutions. In International Conference on Machine Learning, pp.  5363–5372. PMLR, 2018.
  41. Differentiable Top-k with optimal transport. Advances in Neural Information Processing Systems, 33:20520–20531, 2020.
  42. BiSeNet v2: Bilateral network with guided aggregation for real-time semantic segmentation. International Journal of Computer Vision, 129:3051–3068, 2021.
  43. Prune once for all: Sparse pre-trained language models. arXiv preprint arXiv:2111.05754, 2021.
  44. OptG: Optimizing Gradient-driven criteria in network sparsity. arXiv preprint arXiv:2201.12826, 2022.
  45. Linear symmetric quantization of neural networks for low-precision integer hardware. In International Conference on Learning Representations, 2019.
  46. To prune, or not to prune: exploring the efficacy of pruning for model compression. arXiv preprint arXiv:1710.01878, 2017.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Guanhua Ding (5 papers)
  2. Zexi Ye (1 paper)
  3. Zhen Zhong (29 papers)
  4. Gang Li (579 papers)
  5. David Shao (1 paper)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com