Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Sanity-Checking Pruning Methods: Random Tickets can Win the Jackpot (2009.11094v2)

Published 22 Sep 2020 in cs.LG and stat.ML

Abstract: Network pruning is a method for reducing test-time computational resource requirements with minimal performance degradation. Conventional wisdom of pruning algorithms suggests that: (1) Pruning methods exploit information from training data to find good subnetworks; (2) The architecture of the pruned network is crucial for good performance. In this paper, we conduct sanity checks for the above beliefs on several recent unstructured pruning methods and surprisingly find that: (1) A set of methods which aims to find good subnetworks of the randomly-initialized network (which we call "initial tickets"), hardly exploits any information from the training data; (2) For the pruned networks obtained by these methods, randomly changing the preserved weights in each layer, while keeping the total number of preserved weights unchanged per layer, does not affect the final performance. These findings inspire us to choose a series of simple \emph{data-independent} prune ratios for each layer, and randomly prune each layer accordingly to get a subnetwork (which we call "random tickets"). Experimental results show that our zero-shot random tickets outperform or attain a similar performance compared to existing "initial tickets". In addition, we identify one existing pruning method that passes our sanity checks. We hybridize the ratios in our random ticket with this method and propose a new method called "hybrid tickets", which achieves further improvement. (Our code is publicly available at https://github.com/JingtongSu/sanity-checking-pruning)

Overview of "Sanity-Checking Pruning Methods: Random Tickets Can Win the Jackpot"

The paper "Sanity-checking Pruning Methods: Random Tickets Can Win the Jackpot" investigates the prevailing notions regarding network pruning and proposes insights that challenge conventional understanding. Network pruning is crucial for reducing computational costs and maintaining minimal performance degradation. Traditional pruning beliefs emphasize the necessity of exploiting training data and the importance of network architecture in achieving robust performance post-pruning. The paper rigorously tests these assumptions on various unstructured pruning methods.

Key Content and Findings

The authors focus on two central assumptions of common network pruning procedures:

  1. Data Dependency: The belief that pruning methods use information from training data in the pruning step to identify effective subnetworks.
  2. Architecture Dependency: The belief that the connections within the pruned network are vital for retaining performance.

To test these assumptions, the authors utilize several sanity-checking methods. Findings reveal surprising results:

  1. Data Independence: Using corrupted data, such as random labels and pixels, in the pruning step does not affect the ability to identify effective initial networks, termed "initial tickets." Subnetworks obtained via corrupted datasets achieve comparable performance as those derived from genuine datasets.
  2. Architecture Independence: The architecture structure of the pruned subnetworks ("initial tickets") has limited impact. The paper introduces a "layerwise rearrange" method, which completely rearranges the weights within each layer without altering their numbers, and observes negligible performance degradation.

These findings lead to the proposal of "random tickets," which solely utilize simple, data-independent layerwise pruning ratios to achieve performance akin to elaborately structured ones. Furthermore, one existing pruning method, leveraging partially-trained networks, passed sanity checks, prompting the authors to develop "hybrid tickets." This approach combines random ticket strategies with layerwise rewinding to further improve upon the existing partially-trained methods.

Experimental Analysis

Experiments are conducted on CIFAR-10, CIFAR-100, and Tiny-ImageNet datasets using architectures like VGG and ResNet. Results demonstrate that "random tickets" and "hybrid tickets" often outperform standard methods across varying sparsity levels. The authors briefly touch upon iterative pruning, noting its negligible benefit for initial tickets compared to its substantial improvements for partially-trained tickets.

Implications

The insights provided have practical implications for the development of efficient pruning methods in the future. By showcasing that random, data-independent pruning strategies can surpass existing approaches, the research prompts a reevaluation of pruning algorithm designs. It suggests potential for further efficiency without sacrificing performance by intelligently selecting pruning ratios and possibly deploying these findings in neural architecture search (NAS) systems.

Future Directions

The paper opens pathways for exploring:

  1. Data-independent Methods: Reexamine the utility of data-dependent criteria in pruning steps, reinforcing the reassessment of conventional practices within the domain.
  2. Architecture Relevance: Further investigate the architectural aspects that genuinely influence the capacity to retrain subnetworks, particularly examining NAS strategies centered around less traditional criteria.

In summary, the paper provides a composite understanding of pruning processes, urging the community to reconsider long-held beliefs and explore innovative pruning pathways to further advance deep learning efficiency.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Jingtong Su (7 papers)
  2. Tianle Cai (34 papers)
  3. Tianhao Wu (68 papers)
  4. Ruiqi Gao (44 papers)
  5. Liwei Wang (239 papers)
  6. Jason D. Lee (151 papers)
  7. YiHang Chen (29 papers)
Citations (82)