Papers
Topics
Authors
Recent
Search
2000 character limit reached

Partially Frozen Random Networks Contain Compact Strong Lottery Tickets

Published 20 Feb 2024 in cs.LG, cs.AI, and stat.ML | (2402.14029v3)

Abstract: Randomly initialized dense networks contain subnetworks that achieve high accuracy without weight learning--strong lottery tickets (SLTs). Recently, Gadhikar et al. (2023) demonstrated that SLTs could also be found within a randomly pruned source network. This phenomenon can be exploited to further compress the small memory size required by SLTs. However, their method is limited to SLTs that are even sparser than the source, leading to worse accuracy due to unintentionally high sparsity. This paper proposes a method for reducing the SLT memory size without restricting the sparsity of the SLTs that can be found. A random subset of the initial weights is frozen by either permanently pruning them or locking them as a fixed part of the SLT, resulting in a smaller model size. Experimental results show that Edge-Popup (Ramanujan et al., 2020; Sreenivasan et al., 2022) finds SLTs with better accuracy-to-model size trade-off within frozen networks than within dense or randomly pruned source networks. In particular, freezing $70\%$ of a ResNet on ImageNet provides $3.3 \times$ compression compared to the SLT found within a dense counterpart, raises accuracy by up to $14.12$ points compared to the SLT found within a randomly pruned counterpart, and offers a better accuracy-model size trade-off than both.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (35)
  1. Burkholz, R. Most activation functions can win the lottery without excessive depth. Proc. Adv. Neural Inform. Process. Syst., 35:18707–18720, 2022a.
  2. Burkholz, R. Convolutional and residual networks provably contain lottery tickets. In Proc. Int. Conf. Mach. Learn., pp.  2414–2433. PMLR, 2022b.
  3. On the existence of universal lottery tickets. In Proc. Int. Conf. Learn. Repr., 2022.
  4. Proving the lottery ticket hypothesis for convolutional neural networks. In Proc. Int. Conf. Learn. Repr., 2021.
  5. Multi-prize lottery ticket hypothesis: Finding accurate binary neural networks by pruning a randomly weighted network. In Proc. Int. Conf. Learn. Repr., 2021.
  6. Rigging the lottery: Making all tickets winners. In Proc. Int. Conf. Mach. Learn., pp.  2943–2952. PMLR, 2020.
  7. A general framework for proving the equivariant strong lottery ticket hypothesis. In Proc. Int. Conf. Learn. Repr., 2023.
  8. Plant ’n’ seek: Can you find the winning ticket? In Proc. Int. Conf. Learn. Repr., 2022.
  9. Why random pruning is all we need to start sparse. In Proc. Int. Conf. Mach. Learn., pp.  10542–10570. PMLR, 2023.
  10. Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification. In Proc. IEEE Int. Conf. Comput. Vis., pp.  1026–1034, 2015.
  11. Deep residual learning for image recognition. In Proc. IEEE Comput. Soc. Conf. Comput. Vis. and Pattern Recognit., pp.  770–778, 2016.
  12. Hiddenite: 4K-PE hidden network inference 4D-tensor engine exploiting on-chip model construction achieving 34.8-to-16.0 TOPS/W for CIFAR-100 and ImageNet. In Proc. IEEE Int. Solid-State Circuits Conf., volume 65, pp.  1–3. IEEE, 2022.
  13. Open graph benchmark: Datasets for machine learning on graphs. Proc. Adv. Neural Inform. Process. Syst., 33:22118–22133, 2020.
  14. You can have better graph neural networks by not training weights at all: Finding untrained GNNs tickets. In Learn. of Graphs Conf., 2022.
  15. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proc. Int. Conf. Mach. Learn., pp.  448–456. pmlr, 2015.
  16. Krizhevsky, A. Learning multiple layers of features from tiny images. Master’s thesis, Department of Computer Science, University of Toronto, Toronto, 2009.
  17. Deeper insights into graph convolutional networks for semi-supervised learning. In Proc. AAAI Conf. on Artif. Intell., volume 32, 2018.
  18. Recurrent residual networks contain stronger lottery tickets. IEEE Access, 11:16588–16604, 2023. doi: 10.1109/ACCESS.2023.3245808.
  19. SGDR: Stochastic gradient descent with warm restarts. In Proc. Int. Conf. Learn. Repr., 2017.
  20. Decoupled weight decay regularization. In Proc. Int. Conf. Learn. Repr., 2019.
  21. Lueker, G. S. Exponentially small bounds on the expected optimum of the partition and subset sum problems. Random Struct. Algor., 12(1):51–62, 1998.
  22. Proving the lottery ticket hypothesis: Pruning is all you need. In Proc. Int. Conf. Mach. Learn., pp.  6682–6691. PMLR, 2020.
  23. Multicoated supermasks enhance hidden networks. In Proc. Int. Conf. Mach. Learn., pp.  17045–17055, 2022.
  24. Logarithmic pruning is all you need. Proc. Adv. Neural Inform. Process. Syst., 33:2925–2934, 2020.
  25. Optimal lottery tickets via subset sum: Logarithmic over-parameterization is sufficient. Proc. Adv. Neural Inform. Process. Syst., 33:2599–2610, 2020.
  26. Dense for the price of sparse: Improved performance of sparsely initialized networks via a subspace offset. In Proc. Int. Conf. Mach. Learn., pp.  8620–8629. PMLR, 2021.
  27. What’s hidden in a randomly weighted neural network? In Proc. IEEE Comput. Soc. Conf. Comput. Vis. and Pattern Recognit., pp.  11893–11902, 2020.
  28. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
  29. Rare gems: Finding lottery tickets at initialization. Proc. Adv. Neural Inform. Process. Syst., 35:14529–14540, 2022.
  30. Freezenet: Full performance by reduced storage costs. In Proc. Asian Conf. Comput. Vis., 2020.
  31. How powerful are graph neural networks? In Proc. Int. Conf. Learn. Repr., 2019.
  32. Multicoated and folded graph neural networks with strong lottery tickets. In Learn. of Graphs Conf., 2023.
  33. Can we find strong lottery tickets in generative models? In Proc. AAAI Conf. on Artif. Intell., volume 37, pp.  3267–3275, 2023.
  34. Deconstructing lottery tickets: Zeros, signs, and the supermask. Proc. Adv. Neural Inform. Process. Syst., 32, 2019.
  35. Effective sparsification of neural networks with global sparsity constraint. In Proc. IEEE Comput. Soc. Conf. Comput. Vis. and Pattern Recognit., pp.  3599–3608, 2021.
Citations (1)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 2 tweets with 5 likes about this paper.