Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Order of Compression: A Systematic and Optimal Sequence to Combinationally Compress CNN (2403.17447v2)

Published 26 Mar 2024 in cs.LG, cs.CV, and cs.NE

Abstract: Model compression has gained significant popularity as a means to alleviate the computational and memory demands of machine learning models. Each compression technique leverages unique features to reduce the size of neural networks. Although intuitively combining different techniques may enhance compression effectiveness, we find that the order in which they are combined significantly influences performance. To identify the optimal sequence for compressing neural networks, we propose the Order of Compression, a systematic and optimal sequence to apply multiple compression techniques in the most effective order. We start by building the foundations of the orders between any two compression approaches and then demonstrate inserting additional compression between any two compressions will not break the order of the two compression approaches. Based on the foundations, an optimal order is obtained with topological sorting. Validated on image-based regression and classification networks across different datasets, our proposed Order of Compression significantly reduces computational costs by up to 859 times on ResNet34, with negligible accuracy loss (-0.09% for CIFAR10) compared to the baseline model. We believe our simple yet effective exploration of the order of compression will shed light on the practice of model compression.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (30)
  1. Combining weight pruning and knowledge distillation for cnn compression. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.  3191–3198, 2021.
  2. Rmnv2: Reduced mobilenet v2 for cifar10. In 2020 10th Annual Computing and Communication Workshop and Conference (CCWC), pp.  0287–0292, 2020. doi: 10.1109/CCWC47524.2020.9031131.
  3. Cinic-10 is not imagenet or cifar-10. arXiv preprint arXiv:1810.03505, 2018.
  4. Hfpq: deep neural network compression by hardware-friendly pruning-quantization. Applied Intelligence, pp.  1–13, 2021.
  5. Depgraph: Towards any structural pruning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  16091–16101, 2023.
  6. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149, 2015.
  7. Dynamic neural networks: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(11):7436–7456, 2021.
  8. Deep residual learning for image recognition, 2015.
  9. More general and effective model compression via an additive combination of compressions. In Machine Learning and Knowledge Discovery in Databases. Research Track: European Conference, ECML PKDD 2021, Bilbao, Spain, September 13–17, 2021, Proceedings, Part III 21, pp.  233–248. Springer, 2021.
  10. Nnest: Early-stage design space exploration tool for neural network inference accelerators. In Proceedings of the International Symposium on Low Power Electronics and Design, pp.  1–6, 2018.
  11. Learning multiple layers of features from tiny images. Technical report, University of Toronto, 2009.
  12. Fast hybrid search for automatic model compression. Electronics, 13(4):688, 2024.
  13. Predictive exit: Prediction of fine-grained early exits for computation-and energy-efficient inference. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, pp.  8657–8665, 2023.
  14. Additive powers-of-two quantization: An efficient non-uniform discretization for neural networks. In International Conference on Learning Representations, 2019.
  15. Post-training quantization with multiple points: Mixed precision without mixed precision. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pp.  8697–8705, 2021.
  16. Learning to explore distillability and sparsability: a joint framework for model compression. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(3):3378–3395, 2022.
  17. A survey on deep neural network compression: Challenges, overview, and solutions. arXiv preprint arXiv:2010.03954, 2020.
  18. Neill, J. O. An overview of neural network compression. arXiv preprint arXiv:2006.03669, 2020.
  19. Reading digits in natural images with unsupervised feature learning. In NIPS Workshop on Deep Learning and Unsupervised Feature Learning 2011, 2011. URL http://ufldl.stanford.edu/housenumbers/nips2011_housenumbers.pdf.
  20. Efficient adaptive inference for deep convolutional neural networks using hierarchical early exits. Pattern Recognition, 105:107346, 2020.
  21. Model compression via distillation and quantization. arXiv preprint arXiv:1802.05668, 2018.
  22. The possibility of combining and implementing deep neural network compression methods. Axioms, 11(5):229, 2022.
  23. Learning low resource consumption cnn through pruning and quantization. IEEE Transactions on Emerging Topics in Computing, 10(2):886–903, 2021.
  24. Contrastive representation distillation. In International Conference on Learning Representations, 2020.
  25. Smart-dnn+: A memory-efficient neural networks compression framework for the model inference. ACM Trans. Archit. Code Optim., 20(4), oct 2023. ISSN 1544-3566. doi: 10.1145/3617688. URL https://doi.org/10.1145/3617688.
  26. Hmc: Hybrid model compression method based on layer sensitivity grouping. Plos one, 18(10):e0292517, 2023.
  27. Compression of deep neural networks: bridging the gap between conventional-based pruning and evolutionary approach. Neural Computing and Applications, 34(19):16493–16514, 2022.
  28. Deep hybrid compression network for lidar point cloud classification and segmentation. Remote Sensing, 15(16):4015, 2023.
  29. Dorefa-net: Training low bitwidth convolutional neural networks with low bitwidth gradients. arXiv preprint arXiv:1606.06160, 2016.
  30. Improved knowledge distillationwith dynamic network pruning. Gazi University Journal of Science Part C: Design and Technology, 10(3):650–665, 2022. doi: 10.29109/gujsc.1141648.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Yingtao Shen (3 papers)
  2. Minqing Sun (1 paper)
  3. Jie Zhao (214 papers)
  4. An Zou (15 papers)
  5. Jianzhe Lin (15 papers)

Summary

We haven't generated a summary for this paper yet.