The Unreasonable Effectiveness of Random Pruning: Return of the Most Naive Baseline for Sparse Training (2202.02643v1)

Published 5 Feb 2022 in cs.LG, cs.AI, and cs.CV

Abstract: Random pruning is arguably the most naive way to attain sparsity in neural networks, but has been deemed uncompetitive by either post-training pruning or sparse training. In this paper, we focus on sparse training and highlight a perhaps counter-intuitive finding, that random pruning at initialization can be quite powerful for the sparse training of modern neural networks. Without any delicate pruning criteria or carefully pursued sparsity structures, we empirically demonstrate that sparsely training a randomly pruned network from scratch can match the performance of its dense equivalent. There are two key factors that contribute to this revival: (i) the network sizes matter: as the original dense networks grow wider and deeper, the performance of training a randomly pruned sparse network will quickly grow to matching that of its dense equivalent, even at high sparsity ratios; (ii) appropriate layer-wise sparsity ratios can be pre-chosen for sparse training, which shows to be another important performance booster. Simple as it looks, a randomly pruned subnetwork of Wide ResNet-50 can be sparsely trained to outperforming a dense Wide ResNet-50, on ImageNet. We also observed such randomly pruned networks outperform dense counterparts in other favorable aspects, such as out-of-distribution detection, uncertainty estimation, and adversarial robustness. Overall, our results strongly suggest there is larger-than-expected room for sparse training at scale, and the benefits of sparsity might be more universal beyond carefully designed pruning. Our source code can be found at https://github.com/VITA-Group/Random_Pruning.

PDF Abstract

Analyzing the Effectiveness of Random Pruning for Sparse Training

Overview

In this paper, the authors revisited the concept of random pruning in the context of neural network sparsity and training efficiency. Traditionally considered a baseline approach, random pruning has been overshadowed by more sophisticated techniques that employ detailed pruning criteria. However, this paper presents a counterintuitive argument: random pruning, when applied judiciously, can potentially match the performance of dense neural networks, even in large-scale applications such as ImageNet. The authors address two principal factors contributing to this phenomenon: the increase in network size and the strategic application of layer-wise sparsity ratios from the outset.

Main Findings

The findings provide insight into how random pruning can effectively be utilized:

Importance of Network Size: Random pruning demonstrates superior performance as the scale of the network increases. For smaller networks, it is challenging to find pruned networks that perform equivalently to their dense counterparts. However, as networks become wider and deeper, the difference in performance between pruned and dense models diminishes, even at high sparsity levels.
Layer-wise Sparsity Ratios: Choosing optimal layer-wise sparsity ratios before training begins significantly enhances the performance of randomly pruned networks. This paper explores several pre-defined sparsity strategies, discovering that some can push the performance of pruned networks beyond the dense networks.
Comprehensive Evaluation: Randomly pruned networks not only achieve commendable predictive accuracy but also improve in robustness, particularly in out-of-distribution detection, adversarial robustness, and uncertainty estimation. These aspects make randomly pruned networks advantageous in real-world applications where robustness is crucial.

Methodological Insights

The authors conducted an extensive empirical paper involving various architectures and datasets, including CIFAR-10, CIFAR-100, and ImageNet, using models such as ResNet and Wide ResNet-50. The evolution of model performance across different sparsity levels and network sizes provides a comprehensive understanding of random pruning's impact.

Moreover, the paper evaluated broader metrics beyond predictive accuracy, such as adversarial robustness and uncertainty estimation, presenting a holistic view of the benefits that random pruning may confer in practical scenarios.

Theoretical and Practical Implications

This research highlights two critical implications for how we understand and apply sparse training:

Theoretical Implication: It challenges the assumption that sophisticated pruning methods are necessary to achieve high performance in neural networks. The results suggest that simpler, random approaches could suffice, especially as network sizes continue to grow.
Practical Implication: From a computational efficiency perspective, random pruning might offer significant savings in both training and inference phases. This is particularly relevant for large-scale networks, where computational resources can be a bottleneck.

Future Directions

The findings of this paper open up new avenues for research into sparse training methodologies:

Exploration of Additional Layer-wise Sparsity Strategies: Further exploration of different layer-wise sparsity strategies could refine our understanding of how to best apply random pruning.
Generalizing to Other Architectures: Investigating whether the observed phenomena hold across a broader array of network architectures and learning paradigms could strengthen and expand the applicability of the findings.
Understanding Gradient Flow: A deeper theoretical understanding of the effects of random sparsity on gradient flow during training could provide insights into why large, pruned networks remain performant.

Conclusion

This paper makes a substantial contribution by demonstrating the unexpected efficacy of random pruning in deep neural network training. By carefully selecting sparsity ratios and leveraging the benefits of increased network size, random pruning emerges as a competitive approach, challenging the established preference for more sophisticated pruning techniques. These insights could better inform both the future direction of pruning research and the application of neural networks in resource-constrained environments. The open-source availability of the code aids in the reproducibility and further exploration of these findings.

PDF Markdown Bookmark Chat (Pro)

Authors (7)

Shiwei Liu (75 papers)
Tianlong Chen (202 papers)
Xiaohan Chen (30 papers)
Li Shen (362 papers)
Decebal Constantin Mocanu (52 papers)
Zhangyang Wang (374 papers)
Mykola Pechenizkiy (118 papers)

Citations (98)

View on Semantic Scholar

Related Papers

Find Related Papers

GitHub

GitHub - VITA-Group/Random_Pruning: [ICLR 2022] The Unreasonable Effectiveness of Random Pruning: Return of the Most Naive Baseline for Sparse Training by Shiwei Liu, Tianlong Chen, Xiaohan Chen, Li Shen, Decebal Constantin Mocanu, Zhangyang Wang, Mykola Pechenizkiy (73 stars)