Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

UPDP: A Unified Progressive Depth Pruner for CNN and Vision Transformer (2401.06426v1)

Published 12 Jan 2024 in cs.CV and cs.AI

Abstract: Traditional channel-wise pruning methods by reducing network channels struggle to effectively prune efficient CNN models with depth-wise convolutional layers and certain efficient modules, such as popular inverted residual blocks. Prior depth pruning methods by reducing network depths are not suitable for pruning some efficient models due to the existence of some normalization layers. Moreover, finetuning subnet by directly removing activation layers would corrupt the original model weights, hindering the pruned model from achieving high performance. To address these issues, we propose a novel depth pruning method for efficient models. Our approach proposes a novel block pruning strategy and progressive training method for the subnet. Additionally, we extend our pruning method to vision transformer models. Experimental results demonstrate that our method consistently outperforms existing depth pruning methods across various pruning configurations. We obtained three pruned ConvNeXtV1 models with our method applying on ConvNeXtV1, which surpass most SOTA efficient models with comparable inference performance. Our method also achieves state-of-the-art pruning performance on the vision transformer model.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (56)
  1. Layer normalization. arXiv preprint arXiv:1607.06450.
  2. Collapsible linear blocks for super-efficient super resolution. MLSys, 4: 529–547.
  3. Language models are few-shot learners. NeurIPS, 33: 1877–1901.
  4. Once for All: Train One Network and Specialize it for Efficient Deployment. In ICLR.
  5. End-to-end object detection with transformers. In ECCV, 213–229. Springer.
  6. VanillaNet: the Power of Minimalism in Deep Learning. arXiv preprint arXiv:2305.12972.
  7. Shallowing deep networks: Layer-wise pruning based on feature representations. TPAMI, 41(12): 3048–3056.
  8. Chollet, F. 2017. Xception: Deep learning with depthwise separable convolutions. In CVPR, 1251–1258.
  9. Diverse branch block: Building a convolution as an inception-like unit. In CVPR, 10886–10895.
  10. Repvgg: Making vgg-style convnets great again. In CVPR, 13733–13742.
  11. Attention is not all you need: Pure attention loses rank doubly exponentially with depth. In ICML, 2793–2803. PMLR.
  12. Layer folding: Neural network depth reduction using activation linearization. arXiv preprint arXiv:2106.09309.
  13. To filter prune, or to layer prune, that is the question. In ACCV.
  14. Fast sparse convnets. In CVPR, 14629–14638.
  15. DepthShrinker: a new compression paradigm towards boosting real-hardware efficiency of compact neural networks. In ICML, 6849–6862. PMLR.
  16. Levit: a vision transformer in convnet’s clothing for faster inference. In ICCV, 12259–12269.
  17. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149.
  18. Deep residual learning for image recognition. In CVPR, 770–778.
  19. Filter pruning via geometric median for deep convolutional neural networks acceleration. In CVPR, 4340–4349.
  20. Searching for mobilenetv3. In ICCV, 1314–1324.
  21. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861.
  22. Pruning filters for efficient convnets. arXiv preprint arXiv:1608.08710.
  23. Rethinking vision transformers for mobilenet size and speed. arXiv preprint arXiv:2212.08059.
  24. Swin transformer: Hierarchical vision transformer using shifted windows. In ICCV, 10012–10022.
  25. A ConvNet for the 2020s. CVPR.
  26. Metapruning: Meta learning for automatic neural network channel pruning. In ICCV, 3296–3305.
  27. Are sixteen heads really better than one? NeurIPS, 32.
  28. Edgevits: Competing light-weight cnns on mobile devices with vision transformers. In ECCV, 294–311. Springer.
  29. Scalable vision transformers with hierarchical pooling. In ICCV, 377–386.
  30. Channel permutations for N: M sparsity. NeurIPS, 34: 13316–13327.
  31. Efficiently scaling transformer inference. MLSys, 5.
  32. Designing network design spaces. In CVPR, 10428–10436.
  33. Dynamicvit: Efficient vision transformers with dynamic token sparsification. NeurIPS, 34: 13937–13949.
  34. Imagenet large scale visual recognition challenge. Int J Comput Vis, 115: 211–252.
  35. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE conference on computer vision and pattern recognition, 4510–4520.
  36. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556.
  37. Segmenter: Transformer for semantic segmentation. In ICCV, 7262–7272.
  38. Mnasnet: Platform-aware neural architecture search for mobile. In CVPR, 2820–2828.
  39. Efficientnetv2: Smaller models and faster training. In ICML, 10096–10106. PMLR.
  40. Patch slimming for efficient vision transformers. In CVPR, 12165–12174.
  41. SCOP: Scientific Control for Reliable Neural Network Pruning. In Larochelle, H.; Ranzato, M.; Hadsell, R.; Balcan, M.; and Lin, H., eds., NeurIPS, volume 33, 10936–10947. Curran Associates, Inc.
  42. Training data-efficient image transformers & distillation through attention. In ICML, volume 139, 10347–10357.
  43. MobileOne: An Improved One Millisecond Mobile Backbone. In CVPR, 7907–7917.
  44. Alphanet: Improved training of supernets with alpha-divergence. In ICML, 10760–10771. PMLR.
  45. Attentivenas: Improving neural architecture search via attentive sampling. In CVPR, 6418–6427.
  46. Pyramid vision transformer: A versatile backbone for dense prediction without convolutions. In ICCV, 568–578.
  47. Convnext v2: Co-designing and scaling convnets with masked autoencoders. In CVPR, 16133–16142.
  48. Group normalization. In ECCV, 3–19.
  49. Layer pruning via fusible residual convolutional block for deep neural networks. arXiv preprint arXiv:2011.14356.
  50. Width & Depth Pruning for Vision Transformers. AAAI, 36(3): 3143–3151.
  51. Universally slimmable networks and improved training techniques. In ICCV, 1803–1811.
  52. Bignas: Scaling up neural architecture search with big single-stage models. In ECCV, 702–717. Springer.
  53. X-Pruner: eXplainable Pruning for Vision Transformers. ArXiv, abs/2303.04935.
  54. Accelerating training of transformer-based language models with progressive layer dropping. NeurIPS, 33: 14011–14023.
  55. Evolutionary shallowing deep neural networks at block levels. TNNLS, 33(9): 4635–4647.
  56. Vision transformer pruning. arXiv preprint arXiv:2104.08500.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (12)
  1. Ji Liu (285 papers)
  2. Dehua Tang (3 papers)
  3. Yuanxian Huang (3 papers)
  4. Li Zhang (693 papers)
  5. Xiaocheng Zeng (1 paper)
  6. Dong Li (429 papers)
  7. Mingjie Lu (6 papers)
  8. Jinzhang Peng (11 papers)
  9. Yu Wang (939 papers)
  10. Fan Jiang (57 papers)
  11. Lu Tian (58 papers)
  12. Ashish Sirasao (9 papers)
Citations (3)

Summary

We haven't generated a summary for this paper yet.