- The paper provides a theoretical proof that, in over-parameterized networks, pruning yields a weight-subnetwork that approximates any target ReLU network.
- It demonstrates that pruning is as effective as weight optimization by showing both weight- and neuron-subnetworks can match target performance.
- The study highlights the computational challenges inherent in finding optimal subnetworks, paralleling the hardness of training neural networks.
"Proving the Lottery Ticket Hypothesis: Pruning is All You Need"
Introduction
The paper "Proving the Lottery Ticket Hypothesis: Pruning is All You Need" addresses the hypothesis originally proposed by Frankle and Carbin, which suggests the existence of small, high-performing subnetworks within large, randomly-initialized neural networks. This research extends the hypothesis by providing theoretical proofs that such subnetworks exist in over-parameterized networks without requiring further training. The authors categorize subnetworks into weight-subnetworks and neuron-subnetworks, exploring the conditions under which each can approximate target network performance.
Theoretical Contributions
The main theoretical contribution is a proof that, under certain conditions, a sufficiently large randomly-initialized neural network contains a subnetwork that achieves accuracy comparable to a target network. Specifically, for any target ReLU network with bounded weights, a weight-subnetwork can be found within a larger network that closely approximates the target's function, using random initialization.
The proof involves constructing an approximation by pruning a network of depth twice that of the target network. For shallow networks, neuron-subnetworks were shown to be competitive with the random features model, a significant finding given known limitations of random features in capturing complex functions.
Results on Neural Network Pruning
The authors conclude that pruning weight or neuron elements from an over-parameterized network is as effective as optimizing the network's weights for approximating target functions. This implies that in terms of expressive power, pruning is an adequate substitute for training. The required network size before pruning is polynomially dependent on the problem parameters.
Computational Implications
While demonstrating expressive power, the paper also establishes the computational difficulty in finding these subnetworks: finding an optimal subnetwork remains computationally hard, akin to training a neural network. This aligns with the established hardness of learning even simple networks.
The equivalence drawn between random features and neuron-subnetworks suggests inherent limitations in approximating more complex functions, such as single ReLU neurons, efficiently using these methods.
Practical Implications and Future Work
Practically, the results motivate the development of more sophisticated pruning algorithms aimed at both theoretical soundness and empirical efficacy, particularly in reducing inference costs and diversifying beyond gradient-based optimization. Future research directions include refining the computational efficiency of pruning and extending results across varying architectures, e.g., convolutional networks and ResNets.
Conclusion
The paper fortifies the theoretical foundation of the Lottery Ticket Hypothesis, establishing a strong case for pruning as a viable and potent method in neural network approximation. It opens pathways for further exploration into the pruning paradigm with the potential of offering computational advantages, backed by rigorous theoretical grounding.