Partially Frozen Random Networks Contain Compact Strong Lottery Tickets
Abstract: Randomly initialized dense networks contain subnetworks that achieve high accuracy without weight learning--strong lottery tickets (SLTs). Recently, Gadhikar et al. (2023) demonstrated that SLTs could also be found within a randomly pruned source network. This phenomenon can be exploited to further compress the small memory size required by SLTs. However, their method is limited to SLTs that are even sparser than the source, leading to worse accuracy due to unintentionally high sparsity. This paper proposes a method for reducing the SLT memory size without restricting the sparsity of the SLTs that can be found. A random subset of the initial weights is frozen by either permanently pruning them or locking them as a fixed part of the SLT, resulting in a smaller model size. Experimental results show that Edge-Popup (Ramanujan et al., 2020; Sreenivasan et al., 2022) finds SLTs with better accuracy-to-model size trade-off within frozen networks than within dense or randomly pruned source networks. In particular, freezing $70\%$ of a ResNet on ImageNet provides $3.3 \times$ compression compared to the SLT found within a dense counterpart, raises accuracy by up to $14.12$ points compared to the SLT found within a randomly pruned counterpart, and offers a better accuracy-model size trade-off than both.
- Burkholz, R. Most activation functions can win the lottery without excessive depth. Proc. Adv. Neural Inform. Process. Syst., 35:18707–18720, 2022a.
- Burkholz, R. Convolutional and residual networks provably contain lottery tickets. In Proc. Int. Conf. Mach. Learn., pp. 2414–2433. PMLR, 2022b.
- On the existence of universal lottery tickets. In Proc. Int. Conf. Learn. Repr., 2022.
- Proving the lottery ticket hypothesis for convolutional neural networks. In Proc. Int. Conf. Learn. Repr., 2021.
- Multi-prize lottery ticket hypothesis: Finding accurate binary neural networks by pruning a randomly weighted network. In Proc. Int. Conf. Learn. Repr., 2021.
- Rigging the lottery: Making all tickets winners. In Proc. Int. Conf. Mach. Learn., pp. 2943–2952. PMLR, 2020.
- A general framework for proving the equivariant strong lottery ticket hypothesis. In Proc. Int. Conf. Learn. Repr., 2023.
- Plant ’n’ seek: Can you find the winning ticket? In Proc. Int. Conf. Learn. Repr., 2022.
- Why random pruning is all we need to start sparse. In Proc. Int. Conf. Mach. Learn., pp. 10542–10570. PMLR, 2023.
- Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification. In Proc. IEEE Int. Conf. Comput. Vis., pp. 1026–1034, 2015.
- Deep residual learning for image recognition. In Proc. IEEE Comput. Soc. Conf. Comput. Vis. and Pattern Recognit., pp. 770–778, 2016.
- Hiddenite: 4K-PE hidden network inference 4D-tensor engine exploiting on-chip model construction achieving 34.8-to-16.0 TOPS/W for CIFAR-100 and ImageNet. In Proc. IEEE Int. Solid-State Circuits Conf., volume 65, pp. 1–3. IEEE, 2022.
- Open graph benchmark: Datasets for machine learning on graphs. Proc. Adv. Neural Inform. Process. Syst., 33:22118–22133, 2020.
- You can have better graph neural networks by not training weights at all: Finding untrained GNNs tickets. In Learn. of Graphs Conf., 2022.
- Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proc. Int. Conf. Mach. Learn., pp. 448–456. pmlr, 2015.
- Krizhevsky, A. Learning multiple layers of features from tiny images. Master’s thesis, Department of Computer Science, University of Toronto, Toronto, 2009.
- Deeper insights into graph convolutional networks for semi-supervised learning. In Proc. AAAI Conf. on Artif. Intell., volume 32, 2018.
- Recurrent residual networks contain stronger lottery tickets. IEEE Access, 11:16588–16604, 2023. doi: 10.1109/ACCESS.2023.3245808.
- SGDR: Stochastic gradient descent with warm restarts. In Proc. Int. Conf. Learn. Repr., 2017.
- Decoupled weight decay regularization. In Proc. Int. Conf. Learn. Repr., 2019.
- Lueker, G. S. Exponentially small bounds on the expected optimum of the partition and subset sum problems. Random Struct. Algor., 12(1):51–62, 1998.
- Proving the lottery ticket hypothesis: Pruning is all you need. In Proc. Int. Conf. Mach. Learn., pp. 6682–6691. PMLR, 2020.
- Multicoated supermasks enhance hidden networks. In Proc. Int. Conf. Mach. Learn., pp. 17045–17055, 2022.
- Logarithmic pruning is all you need. Proc. Adv. Neural Inform. Process. Syst., 33:2925–2934, 2020.
- Optimal lottery tickets via subset sum: Logarithmic over-parameterization is sufficient. Proc. Adv. Neural Inform. Process. Syst., 33:2599–2610, 2020.
- Dense for the price of sparse: Improved performance of sparsely initialized networks via a subspace offset. In Proc. Int. Conf. Mach. Learn., pp. 8620–8629. PMLR, 2021.
- What’s hidden in a randomly weighted neural network? In Proc. IEEE Comput. Soc. Conf. Comput. Vis. and Pattern Recognit., pp. 11893–11902, 2020.
- Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
- Rare gems: Finding lottery tickets at initialization. Proc. Adv. Neural Inform. Process. Syst., 35:14529–14540, 2022.
- Freezenet: Full performance by reduced storage costs. In Proc. Asian Conf. Comput. Vis., 2020.
- How powerful are graph neural networks? In Proc. Int. Conf. Learn. Repr., 2019.
- Multicoated and folded graph neural networks with strong lottery tickets. In Learn. of Graphs Conf., 2023.
- Can we find strong lottery tickets in generative models? In Proc. AAAI Conf. on Artif. Intell., volume 37, pp. 3267–3275, 2023.
- Deconstructing lottery tickets: Zeros, signs, and the supermask. Proc. Adv. Neural Inform. Process. Syst., 32, 2019.
- Effective sparsification of neural networks with global sparsity constraint. In Proc. IEEE Comput. Soc. Conf. Comput. Vis. and Pattern Recognit., pp. 3599–3608, 2021.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.