Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Generalization Bounds for Magnitude-Based Pruning via Sparse Matrix Sketching (2305.18789v2)

Published 30 May 2023 in cs.LG

Abstract: In this paper, we derive a novel bound on the generalization error of Magnitude-Based pruning of overparameterized neural networks. Our work builds on the bounds in Arora et al. [2018] where the error depends on one, the approximation induced by pruning, and two, the number of parameters in the pruned model, and improves upon standard norm-based generalization bounds. The pruned estimates obtained using our new Magnitude-Based compression algorithm are close to the unpruned functions with high probability, which improves the first criteria. Using Sparse Matrix Sketching, the space of the pruned matrices can be efficiently represented in the space of dense matrices of much smaller dimensions, thereby lowering the second criterion. This leads to stronger generalization bound than many state-of-the-art methods, thereby breaking new ground in the algorithm development for pruning and bounding generalization error of overparameterized models. Beyond this, we extend our results to obtain generalization bound for Iterative Pruning [Frankle and Carbin, 2018]. We empirically verify the success of this new method on ReLU-activated Feed Forward Networks on the MNIST and CIFAR10 datasets.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (33)
  1. Stronger generalization bounds for deep nets via a compression approach. CoRR, abs/1802.05296, 2018. URL http://arxiv.org/abs/1802.05296.
  2. Rademacher and gaussian complexities: Risk bounds and structural results. Journal of Machine Learning Research, 3(Nov):463–482, 2002.
  3. Spectrally-normalized margin bounds for neural networks. CoRR, abs/1706.08498, 2017. URL http://arxiv.org/abs/1706.08498.
  4. The generalization-stability tradeoff in neural network pruning. Advances in Neural Information Processing Systems, 33:20852–20864, 2020.
  5. A survey of model compression and acceleration for deep neural networks. CoRR, abs/1710.09282, 2017. URL http://arxiv.org/abs/1710.09282.
  6. Memory bounded deep convolutional networks. CoRR, abs/1412.1442, 2014. URL http://arxiv.org/abs/1412.1442.
  7. Generalization bounds for neural networks via approximate description length. Advances in Neural Information Processing Systems, 32, 2019.
  8. Sketching sparse matrices. CoRR, abs/1303.6544, 2013. URL http://arxiv.org/abs/1303.6544.
  9. Sharpness-aware minimization for efficiently improving generalization. CoRR, abs/2010.01412, 2020. URL https://arxiv.org/abs/2010.01412.
  10. The lottery ticket hypothesis: Training pruned neural networks. CoRR, abs/1803.03635, 2018. URL http://arxiv.org/abs/1803.03635.
  11. Norm-based generalization bounds for compositionally sparse neural networks. arXiv preprint arXiv:2301.12033, 2023.
  12. Deep Learning. MIT Press, Cambridge, MA, USA, 2016. http://www.deeplearningbook.org.
  13. Learning both weights and connections for efficient neural network. Advances in neural information processing systems, 28, 2015.
  14. What do compressed deep neural networks forget? arXiv preprint arXiv:1911.05248, 2019.
  15. Pruning’s effect on generalization through the lens of training and regularization. arXiv preprint arXiv:2210.13738, 2022.
  16. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
  17. Cifar-10 (canadian institute for advanced research). URL http://www.cs.toronto.edu/~kriz/cifar.html.
  18. Tyler LaBonte. Milkshake: Quick and extendable experimentation with classification models. http://github.com/tmlabonte/milkshake, 2023.
  19. Rafal Latala. Some estimates of norms of random matrices. Proceedings of the American Mathematical Society, 133:1273–1282, 05 2005. doi: 10.2307/4097777.
  20. Supervised autoencoders: Improving generalization performance with unsupervised regularizers. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 31. Curran Associates, Inc., 2018. URL https://proceedings.neurips.cc/paper_files/paper/2018/file/2a38a4a9316c49e5a833517c45d31070-Paper.pdf.
  21. MNIST handwritten digit database. 2010. URL http://yann.lecun.com/exdb/mnist/.
  22. Pruning filters for efficient convnets. CoRR, abs/1608.08710, 2016. URL http://arxiv.org/abs/1608.08710.
  23. Learning efficient convolutional networks through network slimming. In Proceedings of the IEEE international conference on computer vision, pages 2736–2744, 2017.
  24. Generalization bounds for deep convolutional neural networks, 2020.
  25. Proving the lottery ticket hypothesis: Pruning is all you need. CoRR, abs/2002.00585, 2020. URL https://arxiv.org/abs/2002.00585.
  26. Pruning convolutional neural networks for resource efficient inference, 2017.
  27. On activation function coresets for network pruning. CoRR, abs/1907.04018, 2019. URL http://arxiv.org/abs/1907.04018.
  28. Norm-based capacity control in neural networks. In Conference on learning theory, pages 1376–1401. PMLR, 2015.
  29. A pac-bayesian approach to spectrally-normalized margin bounds for neural networks. CoRR, abs/1707.09564, 2017. URL http://arxiv.org/abs/1707.09564.
  30. A probabilistic approach to neural network pruning. CoRR, abs/2105.10065, 2021. URL https://arxiv.org/abs/2105.10065.
  31. The power of two random choices: A survey of techniques and results. 10 2000. doi: 10.1007/978-1-4615-0013-1_9.
  32. Soft weight-sharing for neural network compression, 2017.
  33. Roman Vershynin. High-dimensional probability. 2019. URL https://www.math.uci.edu/~rvershyn/papers/HDP-book/HDP-book.pdf.
Citations (1)

Summary

We haven't generated a summary for this paper yet.