Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
158 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Is Complexity Required for Neural Network Pruning? A Case Study on Global Magnitude Pruning (2209.14624v3)

Published 29 Sep 2022 in cs.LG and cs.CV

Abstract: Pruning neural networks has become popular in the last decade when it was shown that a large number of weights can be safely removed from modern neural networks without compromising accuracy. Numerous pruning methods have been proposed since, each claiming to be better than prior art, however, at the cost of increasingly complex pruning methodologies. These methodologies include utilizing importance scores, getting feedback through back-propagation or having heuristics-based pruning rules amongst others. In this work, we question whether this pattern of introducing complexity is really necessary to achieve better pruning results. We benchmark these SOTA techniques against a simple pruning baseline, namely, Global Magnitude Pruning (Global MP), that ranks weights in order of their magnitudes and prunes the smallest ones. Surprisingly, we find that vanilla Global MP performs very well against the SOTA techniques. When considering sparsity-accuracy trade-off, Global MP performs better than all SOTA techniques at all sparsity ratios. When considering FLOPs-accuracy trade-off, some SOTA techniques outperform Global MP at lower sparsity ratios, however, Global MP starts performing well at high sparsity ratios and performs very well at extremely high sparsity ratios. Moreover, we find that a common issue that many pruning algorithms run into at high sparsity rates, namely, layer-collapse, can be easily fixed in Global MP. We explore why layer collapse occurs in networks and how it can be mitigated in Global MP by utilizing a technique called Minimum Threshold. We showcase the above findings on various models (WRN-28-8, ResNet-32, ResNet-50, MobileNet-V1 and FastGRNN) and multiple datasets (CIFAR-10, ImageNet and HAR-2). Code is available at https://github.com/manasgupta-1/GlobalMP.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (62)
  1. E. Kurtic, D. Campos, T. Nguyen, E. Frantar, M. Kurtz, B. Fineran, M. Goin, and D. Alistarh, “The optimal BERT surgeon: Scalable and accurate second-order pruning for large language models,” in Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing.   Abu Dhabi, United Arab Emirates: Association for Computational Linguistics, Dec. 2022, pp. 4163–4181. [Online]. Available: https://aclanthology.org/2022.emnlp-main.279
  2. T. Gale, E. Elsen, and S. Hooker, “The state of sparsity in deep neural networks,” arXiv preprint arXiv:1902.09574, 2019.
  3. J. Kaplan, S. McCandlish, T. Henighan, T. B. Brown, B. Chess, R. Child, S. Gray, A. Radford, J. Wu, and D. Amodei, “Scaling laws for neural language models,” CoRR, vol. abs/2001.08361, 2020. [Online]. Available: https://arxiv.org/abs/2001.08361
  4. E. Strubell, A. Ganesh, and A. McCallum, “Energy and policy considerations for deep learning in nlp,” 57th Annual Meeting of the Association for Computational Linguistics (ACL), 2019.
  5. R. Bonatti, W. Wang, C. Ho, A. Ahuja, M. Gschwindt, E. Camci, E. Kayacan, S. Choudhury, and S. Scherer, “Autonomous aerial cinematography in unstructured environments with learned artistic decision-making,” Journal of Field Robotics, vol. 37, no. 4, pp. 606–641, 2020.
  6. H. Mostafa and X. Wang, “Parameter efficient training of deep convolutional neural networks by dynamic sparse reparameterization,” in Proceedings of the 36th International Conference on Machine Learning.   PMLR, 2019, pp. 4646–4655.
  7. T. Dettmers and L. Zettlemoyer, “Sparse networks from scratch: Faster training without losing performance,” CoRR, vol. abs/1907.04840, 2019. [Online]. Available: http://arxiv.org/abs/1907.04840
  8. U. Evci, T. Gale, J. Menick, P. S. Castro, and E. Elsen, “Rigging the lottery: Making all tickets winners,” in Proceedings of the 37th International Conference on Machine Learning, ser. Proceedings of Machine Learning Research, H. D. III and A. Singh, Eds., vol. 119.   PMLR, 13–18 Jul 2020, pp. 2943–2952. [Online]. Available: https://proceedings.mlr.press/v119/evci20a.html
  9. T. Lin, S. U. Stich, L. Barba, D. Dmitriev, and M. Jaggi, “Dynamic model pruning with feedback,” in International Conference on Learning Representations, 2020. [Online]. Available: https://openreview.net/forum?id=SJem8lSFwB
  10. A. Kusupati, V. Ramanujan, R. Somani, M. Wortsman, P. Jain, S. Kakade, and A. Farhadi, “Soft threshold weight reparameterization for learnable sparsity,” in Proceedings of the 37th International Conference on Machine Learning, ser. Proceedings of Machine Learning Research, H. D. III and A. Singh, Eds., vol. 119.   PMLR, 13–18 Jul 2020, pp. 5544–5555. [Online]. Available: http://proceedings.mlr.press/v119/kusupati20a.html
  11. J. Frankle, G. K. Dziugaite, D. Roy, and M. Carbin, “Pruning neural networks at initialization: Why are we missing the mark?” in International Conference on Learning Representations, 2021. [Online]. Available: https://openreview.net/forum?id=Ig-VyQc-MLK
  12. J. Frankle and M. Carbin, “The lottery ticket hypothesis: Finding sparse, trainable neural networks,” arXiv preprint arXiv:1803.03635, 2018.
  13. A. Morcos, H. Yu, M. Paganini, and Y. Tian, “One ticket to win them all: generalizing lottery ticket initializations across datasets and optimizers,” in Advances in Neural Information Processing Systems, H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, Eds., vol. 32.   Curran Associates, Inc., 2019. [Online]. Available: https://proceedings.neurips.cc/paper/2019/file/a4613e8d72a61b3b69b32d040f89ad81-Paper.pdf
  14. D. Blalock, J. J. G. Ortiz, J. Frankle, and J. Guttag, “What is the state of neural network pruning?” arXiv preprint arXiv:2003.03033, 2020.
  15. H. Tanaka, D. Kunin, D. L. Yamins, and S. Ganguli, “Pruning neural networks without any data by iteratively conserving synaptic flow,” in Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, and H. Lin, Eds., vol. 33.   Curran Associates, Inc., 2020, pp. 6377–6389. [Online]. Available: https://proceedings.neurips.cc/paper/2020/file/46a4378f835dc8040c8057beb6a2da52-Paper.pdf
  16. A. Renda, J. Frankle, and M. Carbin, “Comparing rewinding and fine-tuning in neural network pruning,” in International Conference on Learning Representations, 2020. [Online]. Available: https://openreview.net/forum?id=S1gSj0NKvB
  17. J. Lee, S. Park, S. Mo, S. Ahn, and J. Shin, “Layer-adaptive sparsity for the magnitude-based pruning,” in International Conference on Learning Representations, 2021. [Online]. Available: https://openreview.net/forum?id=H6ATjJ0TKdf
  18. M. Zhu and S. Gupta, “To prune, or not to prune: exploring the efficacy of pruning for model compression,” ICLR Workshop, vol. abs/1710.01878, 2018.
  19. M. Wortsman, A. Farhadi, and M. Rastegari, “Discovering neural wirings,” in Advances in Neural Information Processing Systems, H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, Eds., vol. 32.   Curran Associates, Inc., 2019. [Online]. Available: https://proceedings.neurips.cc/paper/2019/file/d010396ca8abf6ead8cacc2c2f2f26c7-Paper.pdf
  20. T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, S. Agarwal, A. Herbert-Voss, G. Krueger, T. Henighan, R. Child, A. Ramesh, D. M. Ziegler, J. Wu, C. Winter, C. Hesse, M. Chen, E. Sigler, M. Litwin, S. Gray, B. Chess, J. Clark, C. Berner, S. McCandlish, A. Radford, I. Sutskever, and D. Amodei, “Language models are few-shot learners,” Advances in Neural Information Processing Systems, 2020.
  21. E. Camci, D. Campolo, and E. Kayacan, “Deep reinforcement learning for motion planning of quadrotors using raw depth images,” in 2020 international joint conference on neural networks (IJCNN).   IEEE, 2020, pp. 1–7.
  22. J. Guo, J. Liu, and D. Xu, “Jointpruning: Pruning networks along multiple dimensions for efficient point cloud processing,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 32, no. 6, pp. 3659–3672, 2022.
  23. C. Liu, P. Liu, W. Zhao, and X. Tang, “Visual tracking by structurally optimizing pre-trained cnn,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 30, no. 9, pp. 3153–3166, 2020.
  24. Y. Peng and J. Qi, “Quintuple-media joint correlation learning with deep compression and regularization,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 30, no. 8, pp. 2709–2722, 2020.
  25. K. Liu, W. Liu, H. Ma, M. Tan, and C. Gan, “A real-time action representation with temporal encoding and deep compression,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 31, pp. 647–660, 2021.
  26. H.-J. Kang, “Accelerator-aware pruning for convolutional neural networks,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 30, no. 7, pp. 2093–2103, 2020.
  27. Y. Cheng, D. Wang, P. Zhou, and T. Zhang, “A survey of model compression and acceleration for deep neural networks,” 2017.
  28. H. Kirchhoffer, P. Haase, W. Samek, K. Müller, H. Rezazadegan-Tavakoli, F. Cricri, E. B. Aksu, M. M. Hannuksela, W. Jiang, W. Wang, S. Liu, S. Jain, S. Hamidi-Rad, F. Racapé, and W. Bailer, “Overview of the neural network compression and representation (nnr) standard,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 32, no. 5, pp. 3203–3216, 2022.
  29. A. Almahairi, N. Ballas, T. Cooijmans, Y. Zheng, H. Larochelle, and A. Courville, “Dynamic capacity networks,” in International Conference on Machine Learning, 2016, pp. 2549–2558.
  30. A. Ashok, N. Rhinehart, F. Beainy, and K. M. Kitani, “N2n learning: Network to network compression via policy gradient reinforcement learning,” 2017.
  31. F. N. Iandola, S. Han, M. W. Moskewicz, K. Ashraf, W. J. Dally, and K. Keutzer, “Squeezenet: Alexnet-level accuracy with 50x fewer parameters and ¡0.5mb model size,” 2016.
  32. H. Pham, M. Y. Guan, B. Zoph, Q. V. Le, and J. Dean, “Efficient neural architecture search via parameter sharing,” 2018.
  33. Y. LeCun, J. S. Denker, and S. A. Solla, “Optimal brain damage,” in Advances in Neural Information Processing Systems 2, D. S. Touretzky, Ed.   Morgan-Kaufmann, 1990, pp. 598–605. [Online]. Available: http://papers.nips.cc/paper/250-optimal-brain-damage.pdf
  34. B. Hassibi and D. G. Stork, “Second order derivatives for network pruning: Optimal brain surgeon,” in Advances in Neural Information Processing Systems 5, S. J. Hanson, J. D. Cowan, and C. L. Giles, Eds.   Morgan-Kaufmann, 1993, pp. 164–171.
  35. N. Lee, T. Ajanthan, and P. H. Torr, “Snip: Single-shot network pruning based on connection sensitivity,” arXiv preprint arXiv:1810.02340, 2018.
  36. C. Wang, G. Zhang, and R. Grosse, “Picking winning tickets before training by preserving gradient flow,” in International Conference on Learning Representations, 2020. [Online]. Available: https://openreview.net/forum?id=SkgsACVKPH
  37. A. Peste, E. Iofinova, A. Vladu, and D. Alistarh, “AC/DC: Alternating compressed/decompressed training of deep neural networks,” in Advances in Neural Information Processing Systems, A. Beygelzimer, Y. Dauphin, P. Liang, and J. W. Vaughan, Eds., 2021. [Online]. Available: https://openreview.net/forum?id=T3_AJr9-R5g
  38. P. Molchanov, S. Tyree, T. Karras, T. Aila, and J. Kautz, “Pruning convolutional neural networks for resource efficient inference,” International Conference on Learning Representations, 2017.
  39. J. Liu, Z. XU, R. SHI, R. C. C. Cheung, and H. K. So, “Dynamic sparse training: Find efficient sparse network from scratch with trainable masked layers,” in International Conference on Learning Representations, 2020. [Online]. Available: https://openreview.net/forum?id=SJlbGJrtDB
  40. P. de Jorge, A. Sanyal, H. Behl, P. Torr, G. Rogez, and P. K. Dokania, “Progressive skeletonization: Trimming more fat from a network at initialization,” in International Conference on Learning Representations, 2021. [Online]. Available: https://openreview.net/forum?id=9GsFOUyUPi
  41. J. Guo, W. Zhang, W. Ouyang, and D. Xu, “Model compression using progressive channel pruning,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 31, no. 3, pp. 1114–1124, 2021.
  42. S. Srinivas and R. V. Babu, “Data-free parameter pruning for deep neural networks,” Procedings of the British Machine Vision Conference 2015, 2015. [Online]. Available: http://dx.doi.org/10.5244/C.29.31
  43. T. Kim, H. Choi, and Y. Choe, “Automated filter pruning based on high-dimensional bayesian optimization,” IEEE Access, vol. 10, pp. 22 547–22 555, 2022.
  44. X. Ding, G. Ding, J. Han, and S. Tang, “Auto-balanced filter pruning for efficient convolutional neural networks,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32, no. 1, Apr. 2018. [Online]. Available: https://ojs.aaai.org/index.php/AAAI/article/view/12262
  45. P. Savarese, H. Silva, and M. Maire, “Winning the lottery with continuous sparsification,” Advances in Neural Information Processing Systems, 2020.
  46. H. Wang, C. Qin, Y. Zhang, and Y. Fu, “Neural pruning via growing regularization,” in International Conference on Learning Representations, 2021.
  47. M. Zhao, J. Peng, S. Yu, L. Liu, and N. Wu, “Exploring structural sparsity in cnn via selective penalty,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 32, no. 3, pp. 1658–1666, 2022.
  48. N. Ström, “Sparse connection and pruning in large dynamic artificial neural networks,” 1997.
  49. S. Park, J. Lee, S. Mo, and J. Shin, “Lookahead: a far-sighted alternative of magnitude-based pruning,” International Conference on Learning Representations, 2020.
  50. S. Han, J. Pool, J. Tran, and W. Dally, “Learning both weights and connections for efficient neural network,” in Advances in neural information processing systems, 2015, pp. 1135–1143.
  51. J. Lin, Y. Rao, J. Lu, and J. Zhou, “Runtime neural pruning,” in Advances in Neural Information Processing Systems 30, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, Eds.   Curran Associates, Inc., 2017, pp. 2181–2191. [Online]. Available: http://papers.nips.cc/paper/6813-runtime-neural-pruning.pdf
  52. Y. He, J. Lin, Z. Liu, H. Wang, L.-J. Li, and S. Han, “Amc: Automl for model compression and acceleration on mobile devices,” in European Conference on Computer Vision (ECCV), 2018.
  53. M. Gupta, S. Aravindan, A. Kalisz, V. Chandrasekhar, and L. Jie, “Learning to prune deep neural networks via reinforcement learning,” International Conference on Machine Learning (ICML) AutoML Workshop, 2020.
  54. E. Camci, M. Gupta, M. Wu, and J. Lin, “Qlp: Deep q-learning for pruning deep neural networks,” IEEE Transactions on Circuits and Systems for Video Technology, pp. 1–1, 2022.
  55. E. Frantar, E. Kurtic, and D. Alistarh, “M-FAC: Efficient matrix-free approximations of second-order information,” in Advances in Neural Information Processing Systems, A. Beygelzimer, Y. Dauphin, P. Liang, and J. W. Vaughan, Eds., 2021. [Online]. Available: https://openreview.net/forum?id=EEq6YUrDyfO
  56. S. P. Singh and D. Alistarh, “Woodfisher: Efficient second-order approximation for neural network compression,” in Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, and H. Lin, Eds., vol. 33.   Curran Associates, Inc., 2020, pp. 18 098–18 109. [Online]. Available: https://proceedings.neurips.cc/paper/2020/file/d1ff1ec86b62cd5f3903ff19c3a326b2-Paper.pdf
  57. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” CoRR, vol. abs/1512.03385, 2015. [Online]. Available: http://arxiv.org/abs/1512.03385
  58. A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam, “Mobilenets: Efficient convolutional neural networks for mobile vision applications,” CoRR, vol. abs/1704.04861, 2017. [Online]. Available: http://arxiv.org/abs/1704.04861
  59. A. Kusupati, M. Singh, K. Bhatia, A. J. S. Kumar, P. Jain, and M. Varma, “Fastgrnn: A fast, accurate, stable and tiny kilobyte sized gated recurrent neural network,” in NeurIPS, 2018.
  60. D. Anguita, A. Ghio, L. Oneto, X. Parra, and J. L. Reyes-Ortiz, “A public domain dataset for human activity recognition using smartphones,” 21th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, ESANN 2013. Bruges, Belgium, 2013.
  61. N. Lee, T. Ajanthan, S. Gould, and P. H. S. Torr, “A signal propagation perspective for pruning neural networks at initialization,” in International Conference on Learning Representations, 2020. [Online]. Available: https://openreview.net/forum?id=HJeTo2VFwH
  62. S. Hayou, J.-F. Ton, A. Doucet, and Y. W. Teh, “Robust pruning at initialization,” in International Conference on Learning Representations, 2021. [Online]. Available: https://openreview.net/forum?id=vXj_ucZQ4hA
Citations (10)

Summary

We haven't generated a summary for this paper yet.