Insights into the Lottery Ticket Hypothesis and Iterative Magnitude Pruning (2403.15022v3)
Abstract: Lottery ticket hypothesis for deep neural networks emphasizes the importance of initialization used to re-train the sparser networks obtained using the iterative magnitude pruning process. An explanation for why the specific initialization proposed by the lottery ticket hypothesis tends to work better in terms of generalization (and training) performance has been lacking. Moreover, the underlying principles in iterative magnitude pruning, like the pruning of smaller magnitude weights and the role of the iterative process, lack full understanding and explanation. In this work, we attempt to provide insights into these phenomena by empirically studying the volume/geometry and loss landscape characteristics of the solutions obtained at various stages of the iterative magnitude pruning process.
- J. Frankle and M. Carbin, “The lottery ticket hypothesis: Finding sparse, trainable neural networks,” arXiv preprint arXiv:1803.03635, 2018.
- H. Li, Z. Xu, G. Taylor, C. Studer, and T. Goldstein, “Visualizing the loss landscape of neural nets,” Advances in neural information processing systems, vol. 31, 2018.
- D. Blalock, J. J. Gonzalez Ortiz, J. Frankle, and J. Guttag, “What is the state of neural network pruning?” Proceedings of machine learning and systems, vol. 2, pp. 129–146, 2020.
- S. Han, J. Pool, J. Tran, and W. Dally, “Learning both weights and connections for efficient neural network,” Advances in neural information processing systems, vol. 28, 2015.
- H. Li, A. Kadav, I. Durdanovic, H. Samet, and H. P. Graf, “Pruning filters for efficient convnets,” arXiv preprint arXiv:1608.08710, 2016.
- J. Frankle, G. K. Dziugaite, D. M. Roy, and M. Carbin, “Stabilizing the lottery ticket hypothesis,” arXiv preprint arXiv:1903.01611, 2019.
- J. Frankle and D. Bau, “Dissecting pruned neural networks,” arXiv preprint arXiv:1907.00262, 2019.
- D. Bau, B. Zhou, A. Khosla, A. Oliva, and A. Torralba, “Network dissection: Quantifying interpretability of deep visual representations,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 6541–6549.
- J. Frankle, G. K. Dziugaite, D. M. Roy, and M. Carbin, “Pruning neural networks at initialization: Why are we missing the mark?” arXiv preprint arXiv:2009.08576, 2020.
- J. Frankle, G. K. Dziugaite, D. Roy, and M. Carbin, “Linear mode connectivity and the lottery ticket hypothesis,” in International Conference on Machine Learning. PMLR, 2020, pp. 3259–3269.
- B. W. Larsen, S. Fort, N. Becker, and S. Ganguli, “How many degrees of freedom do we need to train deep networks: a loss landscape perspective,” arXiv preprint arXiv:2107.05802, 2021.
- S. Zhang, M. Wang, S. Liu, P.-Y. Chen, and J. Xiong, “Why lottery ticket wins? a theoretical perspective of sample complexity on sparse neural networks,” Advances in Neural Information Processing Systems, vol. 34, pp. 2707–2720, 2021.
- J. S. Rosenfeld, J. Frankle, M. Carbin, and N. Shavit, “On the predictability of pruning across scales,” in International Conference on Machine Learning. PMLR, 2021, pp. 9075–9083.
- R. Movva, J. Frankle, and M. Carbin, “Studying the consistency and composability of lottery ticket pruning masks,” arXiv preprint arXiv:2104.14753, 2021.
- M. Paul, B. Larsen, S. Ganguli, J. Frankle, and G. K. Dziugaite, “Lottery tickets on a data diet: Finding initializations with sparse trainable networks,” Advances in Neural Information Processing Systems, vol. 35, pp. 18 916–18 928, 2022.
- T. Jin, M. Carbin, D. M. Roy, J. Frankle, and G. K. Dziugaite, “Pruning’s effect on generalization through the lens of training and regularization,” arXiv preprint arXiv:2210.13738, 2022.
- M. Paul, F. Chen, B. W. Larsen, J. Frankle, S. Ganguli, and G. K. Dziugaite, “Unmasking the lottery ticket hypothesis: What’s encoded in a winning ticket’s mask?” arXiv preprint arXiv:2210.03044, 2022.
- Z. Liu, M. Sun, T. Zhou, G. Huang, and T. Darrell, “Rethinking the value of network pruning,” arXiv preprint arXiv:1810.05270, 2018.
- A. Renda, J. Frankle, and M. Carbin, “Comparing rewinding and fine-tuning in neural network pruning,” arXiv preprint arXiv:2003.02389, 2020.
- C. Liu, L. Zhu, and M. Belkin, “Toward a theory of optimization for over-parameterized systems of non-linear equations: the lessons of deep learning,” arXiv preprint arXiv:2003.00307, 2020.
- Y. Cooper, “The loss landscape of overparameterized neural networks,” arXiv preprint arXiv:1804.10200, 2018.
- S. Hochreiter and J. Schmidhuber, “Flat minima,” Neural computation, vol. 9, no. 1, pp. 1–42, 1997.
- P. Chaudhari, A. Choromanska, S. Soatto, Y. LeCun, C. Baldassi, C. Borgs, J. Chayes, L. Sagun, and R. Zecchina, “Entropy-sgd: Biasing gradient descent into wide valleys,” Journal of Statistical Mechanics: Theory and Experiment, vol. 2019, no. 12, p. 124018, 2019.
- N. S. Keskar, D. Mudigere, J. Nocedal, M. Smelyanskiy, and P. T. P. Tang, “On large-batch training for deep learning: Generalization gap and sharp minima,” arXiv preprint arXiv:1609.04836, 2016.
- L. Dinh, R. Pascanu, S. Bengio, and Y. Bengio, “Sharp minima can generalize for deep nets,” in International Conference on Machine Learning. PMLR, 2017, pp. 1019–1028.
- W. R. Huang, Z. Emam, M. Goldblum, L. Fowl, J. K. Terry, F. Huang, and T. Goldstein, “Understanding generalization through visualizations,” 2020.
- L. Wu, Z. Zhu et al., “Towards understanding generalization of deep learning: Perspective of loss landscapes,” arXiv preprint arXiv:1706.10239, 2017.
- Y. LeCun, J. Denker, and S. Solla, “Optimal brain damage,” Advances in neural information processing systems, vol. 2, 1989.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.