What's Hidden in a Randomly Weighted Neural Network? (1911.13299v2)

Published 29 Nov 2019 in cs.CV and cs.LG

Abstract: Training a neural network is synonymous with learning the values of the weights. By contrast, we demonstrate that randomly weighted neural networks contain subnetworks which achieve impressive performance without ever training the weight values. Hidden in a randomly weighted Wide ResNet-50 we show that there is a subnetwork (with random weights) that is smaller than, but matches the performance of a ResNet-34 trained on ImageNet. Not only do these "untrained subnetworks" exist, but we provide an algorithm to effectively find them. We empirically show that as randomly weighted neural networks with fixed weights grow wider and deeper, an "untrained subnetwork" approaches a network with learned weights in accuracy. Our code and pretrained models are available at https://github.com/allenai/hidden-networks.

Citations (330)

View on Semantic Scholar

Summary

The paper demonstrates that randomly weighted neural networks host subnetworks capable of matching the performance of fully trained models.
It introduces the edge-popup algorithm, which scores weights to dynamically select effective subnetworks without any weight adjustments.
Empirical results on Wide ResNet-50 and ResNet-34 using ImageNet showcase that overparameterized networks embody efficient architectural pathways for potential model compression and NAS.

Insights into Randomly Weighted Neural Networks

The paper "What's Hidden in a Randomly Weighted Neural Network?" explores the potential of randomly initialized neural networks, proposing that they inherently contain subnetworks capable of achieving notable performance without any weight optimization. This assertion challenges the conventional neural network paradigm that focuses on weight tuning through stochastic gradient descent.

Subnetwork Discovery in Wide Networks

The primary contribution is the proposition and empirical validation of a method to uncover subnetworks that perform competitively within a random weight-initialized network. The authors argue that in overparameterized networks, there exists a combinatorial multitude of subnetworks, some of which can achieve performance equivalent to smaller networks with trained weights.

Through their experiments, the authors demonstrate that subnetworks of a Wide ResNet-50 with random initial weights can match the performance of a fully trained ResNet-34 on the ImageNet dataset. This assertion is empirically supported by the development of an algorithm termed edge-popup. The method optimizes subnetworks by associating a non-negative score with each network weight, dynamically choosing a subnetwork based on the highest scores, akin to pruning methodologies but devoid of weight adjustments.

Theoretical and Empirical Justifications

Theoretically, the authors leverage the infinite width limit to suggest why random weights might house performant subnetworks. They highlight the sheer number of combination possibilities in large neural models as a source of such robust configurations. This thesis is underpinned by experimental results that outline how increasing network width and depth enhances the chances of finding effective subnetworks. Key experiments on CIFAR-10 and ImageNet datasets underscore this by illustrating that wider and deeper randomly initialized networks yield specific subnetworks with performance comparable to their trained counterparts.

Implications for Neural Network Initialization and Structure

The findings prompt a reevaluation of the role of initialization and the necessity of weight training. The discovery of capable, untrained subnetworks indicates a possible reduction in computational resources required for network training. Moreover, it suggests new pathways for neural architecture search (NAS) that could focus more on structural discovery than traditional weight optimization routines.

The implications resonate beyond pure performance metrics. The results could have downstream effects in realms such as model compression, where subnetworks of densely-connected networks operate efficiently at reduced resource costs. Additionally, these findings open discussions regarding the Lottery Ticket Hypothesis, emphasizing the innate presence of such subnetworks from the onset rather than as products of iterative training.

Future Prospects

Future research should aim to refine subnetwork identification methods, potentially enhancing both the speed and accuracy with which these subnetworks are found. Moreover, further exploration is warranted into the potential synergies between learned weight structures and discovered subnetworks for tasks requiring higher precision.

The paper challenges deeply held assumptions about the relationship between a network's parameters and its capacity to learn, potentially setting a foundation for future breakthroughs that leverage untrained, randomly weighted subnetworks for practical AI applications.

PDF Markdown

Related Papers

GitHub

GitHub - allenai/hidden-networks (186 stars)

Tweets

https://twitter.com/statusfailed/status/1811819897862639990

YouTube

Show All Videos

Reddit

"What's Hidden in a Randomly Weighted Neural Network?", Ramanujan et al 2019 (even random nets contain, with increasing probability in size, an accurate sub-net) (13 points, 2 comments)