Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Fantastic Weights and How to Find Them: Where to Prune in Dynamic Sparse Training (2306.12230v2)

Published 21 Jun 2023 in cs.LG, cs.AI, cs.CV, and stat.ML

Abstract: Dynamic Sparse Training (DST) is a rapidly evolving area of research that seeks to optimize the sparse initialization of a neural network by adapting its topology during training. It has been shown that under specific conditions, DST is able to outperform dense models. The key components of this framework are the pruning and growing criteria, which are repeatedly applied during the training process to adjust the network's sparse connectivity. While the growing criterion's impact on DST performance is relatively well studied, the influence of the pruning criterion remains overlooked. To address this issue, we design and perform an extensive empirical analysis of various pruning criteria to better understand their impact on the dynamics of DST solutions. Surprisingly, we find that most of the studied methods yield similar results. The differences become more significant in the low-density regime, where the best performance is predominantly given by the simplest technique: magnitude-based pruning. The code is provided at https://github.com/alooow/fantastic_weights_paper

Definition Search Book Streamline Icon: https://streamlinehq.com
References (66)
  1. A Brain-inspired Algorithm for Training Highly Sparse Neural Networks. Machine Learning, pages 1–42, 2022. URL: https://arxiv.org/abs/1903.07138.
  2. Searching for Exotic Particles in High-Energy Physics with Deep Learning. Nature communications, 5(1):1–9, 2014. URL: https://arxiv.org/abs/1402.4735.
  3. Deep Rewiring: Training very sparse deep networks. International Conference on Learning Representations, 2018. URL: https://arxiv.org/abs/1711.05136.
  4. What is the State of Neural Network Pruning? Proceedings of Machine Learning and Systems, 2:129–146, 2020. URL: https://arxiv.org/abs/2003.03033.
  5. Language Models are Few-Shot Learners. Advances in Neural Information Processing Systems, 33:1877–1901, 2020. URL: https://arxiv.org/abs/2005.14165.
  6. A Survey of Model Compression and Acceleration for Deep Neural Networks. IEEE Signal Processing Magazine, 2017. URL: https://arxiv.org/abs/1710.09282.
  7. François Chollet. Deep Learning with Python. Manning Publications, 2nd edition, 2021. URL: https://fchollet.com/#books.
  8. Truly Sparse Neural Networks at Scale. arXiv preprint arXiv:2102.01732, 2021. URL: https://arxiv.org/abs/2102.01732.
  9. NeST: A Neural Network Synthesis Tool Based on a Grow-and-Prune Paradigm. IEEE Transactions on Computers, 68(10):1487–1497, 2019. URL: https://arxiv.org/abs/1711.02017.
  10. Janez Demšar. Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research, 7(1):1–30, 2006.
  11. Sparse Networks from Scratch: Faster Training without Losing Performance. arXiv preprint arXiv:1907.04840, 2019. URL: https://arxiv.org/abs/1907.04840.
  12. Rigging the Lottery: Making All Tickets Winners. In International Conference on Machine Learning, pages 2943–2952. PMLR, 2020. URL: https://arxiv.org/abs/1911.11134.
  13. Gradient Flow in Sparse Neural Networks and How Lottery Tickets Win. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pages 6577–6586, 2022. URL: https://arxiv.org/abs/2010.03533.
  14. Discovering faster matrix multiplication algorithms with reinforcement learning. Nature, 610(7930):47–53, 2022. URL: https://www.nature.com/articles/s41586-022-05172-4.
  15. The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks. International Conference on Learning Representations, 2019. URL: https://arxiv.org/abs/1803.03635.
  16. Pruning neural networks at initialization: Why are we missing the mark? arXiv preprint arXiv:2009.08576, 2020.
  17. The state of sparsity in deep neural networks. arXiv preprint arXiv:1902.09574, 2019.
  18. The State of Sparse Training in Deep Reinforcement Learning. In International Conference on Machine Learning, pages 7766–7792. PMLR, 2022. URL: https://arxiv.org/abs/2206.10369.
  19. Automatic noise filtering with dynamic sparse training in deep reinforcement learning. In Proceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems, pages 1932–1941, 2023.
  20. Learning both Weights and Connections for Efficient Neural Networks. Advances in Neural Information Processing Systems, 28, 2015. URL: https://arxiv.org/abs/1506.02626.
  21. Optimal Brain Surgeon and General Network Pruning. In IEEE international conference on neural networks, pages 293–299. IEEE, 1993. URL: https://authors.library.caltech.edu/54981/.
  22. Deep Residual Learning for Image Recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778, 2016. URL: https://arxiv.org/abs/1512.03385.
  23. Sparsity in Deep Learning: Pruning and growth for efficient inference and training in neural networks. Journal of Machine Learning Research, 22(241):1–124, 2021. URL: https://arxiv.org/abs/2102.00554.
  24. Sara Hooker. The Hardware Lottery. Communications of the ACM, 64(12):58–65, 2021. URL: https://dl.acm.org/doi/10.1145/3467017.
  25. Steven A Janowsky. Pruning versus clipping in neural networks. Physical Review A, 39(12):6600, 1989. URL: https://journals.aps.org/pra/abstract/10.1103/PhysRevA.39.6600.
  26. Highly accurate protein structure prediction with AlphaFold. Nature, 596(7873):583–589, 2021. URL: https://pubmed.ncbi.nlm.nih.gov/34265844/.
  27. Learning Multiple Layers of Features from Tiny Images. 2009. URL: https://www.cs.toronto.edu/~kriz/learning-features-2009-TR.pdf.
  28. Autosparse: Towards automated sparse training of deep neural networks. arXiv preprint arXiv:2304.06941, 2023.
  29. Ya Le and Xuan Yang. Tiny imagenet visual recognition challenge. CS 231N, 7(7):3, 2015.
  30. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.
  31. Optimal Brain Damage. Advances in Neural Information Processing Systems, 2, 1989. URL: https://papers.nips.cc/paper/1989/hash/6c9882bbac1c7093bd25041881277658-Abstract.html.
  32. SNIP: Single-shot Network Pruning based on Connection Sensitivity. International Conference on Learning Representations, 2019. URL: https://arxiv.org/abs/1810.02340.
  33. Pruning Filters for Efficient ConvNets. International Conference on Learning Representations, 2017. URL: https://arxiv.org/abs/1608.08710.
  34. The Unreasonable Effectiveness of Random Pruning: Return of the Most Naive Baseline for Sparse Training. International Conference on Learning Representations, 2022. URL: https://arxiv.org/abs/2202.02643.
  35. Sparsity may cry: Let us fail (current) sparse neural networks together! ICLR 2023, 2023.
  36. Topological Insights into Sparse Neural Networks. In Joint European conference on machine learning and knowledge discovery in databases, pages 279–294. Springer, 2020. URL: https://arxiv.org/abs/2006.14085.
  37. Sparse evolutionary deep learning with over one million artificial neurons on commodity hardware. Neural Computing and Applications, 33(7):2589–2604, 2021. URL: https://arxiv.org/abs/1901.09181.
  38. Do We Actually Need Dense Over-Parameterization? In-Time Over-Parameterization in Sparse Training. In International Conference on Machine Learning, pages 6989–7000. PMLR, 2021. URL: https://arxiv.org/abs/2102.02887.
  39. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692, 2019.
  40. A topological insight into restricted boltzmann machines. Machine Learning, 104(2):243–270, 2016. URL: https://link.springer.com/article/10.1007/s10994-016-5570-z.
  41. Sparse Training Theory for Scalable and Efficient Agents. International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 2021. URL: https://arxiv.org/abs/2103.01636.
  42. Scalable Training of Artificial Neural Networks with Adaptive Sparse Connectivity inspired by Network Science. Nature communications, 9(1):1–12, 2018. URL: https://arxiv.org/abs/1707.04780.
  43. Importance Estimation for Neural Network Pruning. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2019. URL: https://arxiv.org/abs/1906.10771.
  44. Using relevance to reduce network size automatically. Connection Science, 1(1):3–16, 1989. URL: https://doi.org/10.1080/09540098908915626.
  45. Efficient Sparse Artificial Neural Networks. arXiv preprint arXiv:2103.07674, 03 2021. URL: https://arxiv.org/abs/2103.07674.
  46. Exploring Sparsity in Recurrent Neural Networks. International Conference on Learning Representations, 2017. URL: https://arxiv.org/abs/1704.05119.
  47. Training Adversarially Robust Sparse Networks via Bayesian Connectivity Sampling. In International Conference on Machine Learning, pages 8314–8324. PMLR, 2021. URL: https://proceedings.mlr.press/v139/ozdenizci21a.html.
  48. What’s Hidden in a Randomly Weighted Neural Network? In Conference on Computer Vision and Pattern Recognition, pages 11893–11902, 2020. URL: https://arxiv.org/abs/1911.13299.
  49. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision (IJCV), 115(3):211–252, 2015. URL: https://arxiv.org/abs/1409.0575.
  50. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
  51. A bayesian perspective on generalization and stochastic gradient descent. arXiv preprint arXiv:1710.06451, 2017.
  52. SpaceNet: Make Free Space For Continual Learning. Neurocomputing, 439:1–11, 2021. URL: https://arxiv.org/abs/2007.07617.
  53. Dynamic Sparse Training for Deep Reinforcement Learning. arXiv preprint arXiv:2106.04217, 2021. URL: https://arxiv.org/abs/2106.04217.
  54. Rare Gems: Finding Lottery Tickets at Initialization. arXiv preprint arXiv:2202.12002, 2022. URL: https://arxiv.org/abs/2202.12002.
  55. A simple and effective pruning approach for large language models. arXiv preprint arXiv:2306.11695, 2023.
  56. Commonsenseqa: A question answering challenge targeting commonsense knowledge. arXiv preprint arXiv:1811.00937, 2018.
  57. Efficientnet: Rethinking model scaling for convolutional neural networks. In International conference on machine learning, pages 6105–6114. PMLR, 2019.
  58. RLx2: Training a Sparse Deep Reinforcement Learning Model from Scratch. arXiv preprint arXiv:2205.15043, 2022. URL: https://arxiv.org/abs/2205.15043.
  59. Pruning neural networks without any data by iteratively conserving synaptic flow. Advances in neural information processing systems, 33:6377–6389, 2020.
  60. Picking Winning Tickets Before Training by Preserving Gradient Flow. International Conference on Learning Representations, 2020. URL: https://arxiv.org/abs/2002.07376.
  61. Discovering Neural Wirings. Advances in Neural Information Processing Systems, 32, 2019. URL: https://arxiv.org/abs/1906.00586.
  62. SparseMask: Differentiable Connectivity Learning for Dense Image Prediction. In Proceedings of the IEEE International Conference on Computer Vision, pages 6768–6777, 2019. URL: https://arxiv.org/abs/1904.07642.
  63. Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747, 2017.
  64. MEST: Accurate and Fast Memory-Economic Sparse Training Framework on the Edge. Advances in Neural Information Processing Systems, 34:20838–20850, 2021. URL: https://arxiv.org/abs/2110.14032.
  65. Learning N:M Fine-grained Structured Sparse Neural Networks From Scratch. International Conference on Learning Representations, 2020. URL: https://arxiv.org/abs/2102.04010.
  66. Deconstructing Lottery Tickets: Zeros, Signs, and the Supermask. Advances in Neural Information Processing Systems, 32, 2019. URL: https://arxiv.org/abs/1905.01067.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Aleksandra I. Nowak (2 papers)
  2. Bram Grooten (9 papers)
  3. Decebal Constantin Mocanu (52 papers)
  4. Jacek Tabor (106 papers)
Citations (6)