Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 150 tok/s
Gemini 2.5 Pro 42 tok/s Pro
GPT-5 Medium 23 tok/s Pro
GPT-5 High 21 tok/s Pro
GPT-4o 87 tok/s Pro
Kimi K2 195 tok/s Pro
GPT OSS 120B 443 tok/s Pro
Claude Sonnet 4.5 34 tok/s Pro
2000 character limit reached

Insights into the Lottery Ticket Hypothesis and Iterative Magnitude Pruning (2403.15022v3)

Published 22 Mar 2024 in cs.LG

Abstract: Lottery ticket hypothesis for deep neural networks emphasizes the importance of initialization used to re-train the sparser networks obtained using the iterative magnitude pruning process. An explanation for why the specific initialization proposed by the lottery ticket hypothesis tends to work better in terms of generalization (and training) performance has been lacking. Moreover, the underlying principles in iterative magnitude pruning, like the pruning of smaller magnitude weights and the role of the iterative process, lack full understanding and explanation. In this work, we attempt to provide insights into these phenomena by empirically studying the volume/geometry and loss landscape characteristics of the solutions obtained at various stages of the iterative magnitude pruning process.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (28)
  1. J. Frankle and M. Carbin, “The lottery ticket hypothesis: Finding sparse, trainable neural networks,” arXiv preprint arXiv:1803.03635, 2018.
  2. H. Li, Z. Xu, G. Taylor, C. Studer, and T. Goldstein, “Visualizing the loss landscape of neural nets,” Advances in neural information processing systems, vol. 31, 2018.
  3. D. Blalock, J. J. Gonzalez Ortiz, J. Frankle, and J. Guttag, “What is the state of neural network pruning?” Proceedings of machine learning and systems, vol. 2, pp. 129–146, 2020.
  4. S. Han, J. Pool, J. Tran, and W. Dally, “Learning both weights and connections for efficient neural network,” Advances in neural information processing systems, vol. 28, 2015.
  5. H. Li, A. Kadav, I. Durdanovic, H. Samet, and H. P. Graf, “Pruning filters for efficient convnets,” arXiv preprint arXiv:1608.08710, 2016.
  6. J. Frankle, G. K. Dziugaite, D. M. Roy, and M. Carbin, “Stabilizing the lottery ticket hypothesis,” arXiv preprint arXiv:1903.01611, 2019.
  7. J. Frankle and D. Bau, “Dissecting pruned neural networks,” arXiv preprint arXiv:1907.00262, 2019.
  8. D. Bau, B. Zhou, A. Khosla, A. Oliva, and A. Torralba, “Network dissection: Quantifying interpretability of deep visual representations,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 6541–6549.
  9. J. Frankle, G. K. Dziugaite, D. M. Roy, and M. Carbin, “Pruning neural networks at initialization: Why are we missing the mark?” arXiv preprint arXiv:2009.08576, 2020.
  10. J. Frankle, G. K. Dziugaite, D. Roy, and M. Carbin, “Linear mode connectivity and the lottery ticket hypothesis,” in International Conference on Machine Learning.   PMLR, 2020, pp. 3259–3269.
  11. B. W. Larsen, S. Fort, N. Becker, and S. Ganguli, “How many degrees of freedom do we need to train deep networks: a loss landscape perspective,” arXiv preprint arXiv:2107.05802, 2021.
  12. S. Zhang, M. Wang, S. Liu, P.-Y. Chen, and J. Xiong, “Why lottery ticket wins? a theoretical perspective of sample complexity on sparse neural networks,” Advances in Neural Information Processing Systems, vol. 34, pp. 2707–2720, 2021.
  13. J. S. Rosenfeld, J. Frankle, M. Carbin, and N. Shavit, “On the predictability of pruning across scales,” in International Conference on Machine Learning.   PMLR, 2021, pp. 9075–9083.
  14. R. Movva, J. Frankle, and M. Carbin, “Studying the consistency and composability of lottery ticket pruning masks,” arXiv preprint arXiv:2104.14753, 2021.
  15. M. Paul, B. Larsen, S. Ganguli, J. Frankle, and G. K. Dziugaite, “Lottery tickets on a data diet: Finding initializations with sparse trainable networks,” Advances in Neural Information Processing Systems, vol. 35, pp. 18 916–18 928, 2022.
  16. T. Jin, M. Carbin, D. M. Roy, J. Frankle, and G. K. Dziugaite, “Pruning’s effect on generalization through the lens of training and regularization,” arXiv preprint arXiv:2210.13738, 2022.
  17. M. Paul, F. Chen, B. W. Larsen, J. Frankle, S. Ganguli, and G. K. Dziugaite, “Unmasking the lottery ticket hypothesis: What’s encoded in a winning ticket’s mask?” arXiv preprint arXiv:2210.03044, 2022.
  18. Z. Liu, M. Sun, T. Zhou, G. Huang, and T. Darrell, “Rethinking the value of network pruning,” arXiv preprint arXiv:1810.05270, 2018.
  19. A. Renda, J. Frankle, and M. Carbin, “Comparing rewinding and fine-tuning in neural network pruning,” arXiv preprint arXiv:2003.02389, 2020.
  20. C. Liu, L. Zhu, and M. Belkin, “Toward a theory of optimization for over-parameterized systems of non-linear equations: the lessons of deep learning,” arXiv preprint arXiv:2003.00307, 2020.
  21. Y. Cooper, “The loss landscape of overparameterized neural networks,” arXiv preprint arXiv:1804.10200, 2018.
  22. S. Hochreiter and J. Schmidhuber, “Flat minima,” Neural computation, vol. 9, no. 1, pp. 1–42, 1997.
  23. P. Chaudhari, A. Choromanska, S. Soatto, Y. LeCun, C. Baldassi, C. Borgs, J. Chayes, L. Sagun, and R. Zecchina, “Entropy-sgd: Biasing gradient descent into wide valleys,” Journal of Statistical Mechanics: Theory and Experiment, vol. 2019, no. 12, p. 124018, 2019.
  24. N. S. Keskar, D. Mudigere, J. Nocedal, M. Smelyanskiy, and P. T. P. Tang, “On large-batch training for deep learning: Generalization gap and sharp minima,” arXiv preprint arXiv:1609.04836, 2016.
  25. L. Dinh, R. Pascanu, S. Bengio, and Y. Bengio, “Sharp minima can generalize for deep nets,” in International Conference on Machine Learning.   PMLR, 2017, pp. 1019–1028.
  26. W. R. Huang, Z. Emam, M. Goldblum, L. Fowl, J. K. Terry, F. Huang, and T. Goldstein, “Understanding generalization through visualizations,” 2020.
  27. L. Wu, Z. Zhu et al., “Towards understanding generalization of deep learning: Perspective of loss landscapes,” arXiv preprint arXiv:1706.10239, 2017.
  28. Y. LeCun, J. Denker, and S. Solla, “Optimal brain damage,” Advances in neural information processing systems, vol. 2, 1989.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Open Questions

We haven't generated a list of open questions mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 1 tweet and received 2 likes.

Upgrade to Pro to view all of the tweets about this paper: