Pruning from Scratch (1909.12579v1)

Published 27 Sep 2019 in cs.CV

Abstract: Network pruning is an important research field aiming at reducing computational costs of neural networks. Conventional approaches follow a fixed paradigm which first trains a large and redundant network, and then determines which units (e.g., channels) are less important and thus can be removed. In this work, we find that pre-training an over-parameterized model is not necessary for obtaining the target pruned structure. In fact, a fully-trained over-parameterized model will reduce the search space for the pruned structure. We empirically show that more diverse pruned structures can be directly pruned from randomly initialized weights, including potential models with better performance. Therefore, we propose a novel network pruning pipeline which allows pruning from scratch. In the experiments for compressing classification models on CIFAR10 and ImageNet datasets, our approach not only greatly reduces the pre-training burden of traditional pruning methods, but also achieves similar or even higher accuracy under the same computation budgets. Our results facilitate the community to rethink the effectiveness of existing techniques used for network pruning.

PDF Abstract

Pruning from Scratch: A New Perspective on Network Pruning

The paper titled "Pruning from Scratch" explores an innovative direction in the field of neural network pruning, a crucial area dedicated to reducing computational overheads of deep learning models. The paper challenges the longstanding paradigm which necessitates pre-training a large model before pruning. Conventionally, the process involves training an over-parameterized model and then systematically removing less significant units, such as channels. This approach has inherently high computational costs due to the initial training of a large model.

Contrary to conventional wisdom, the authors demonstrate empirically that pre-training a model is not a prerequisite for discovering effective pruned architectures. Instead, the paper posits that pruned structures with potentially superior performance can directly emerge from networks initialized with random weights. This revelation suggests that a fully trained large model might actually confine the search space, limiting the diversity and potential efficacy of pruned configurations.

The authors propose a novel pruning methodology, termed "pruning from scratch," which eliminates the pre-training phase. In their framework, channel importance is learned from randomly initialized weights, yielding diverse pruned structures without the computational cost of full pre-training. This hypothesis is validated through experiments on CIFAR10 and ImageNet datasets, where the proposed method not only alleviates the pre-training burden but also achieves comparable or superior accuracy under equivalent computational budgets.

The paper's experimental findings compellingly illustrate that pruned structures obtained from randomly initialized weights not only achieve similar numerical performance to those derived from pre-trained models but also exhibit greater structural diversity. For instance, the proposed method achieves at least a $10\times$ to $100\times$ increase in speed while maintaining or improving model accuracy—a significant advancement given the increasing demand for efficient model deployment, particularly in resource-constrained environments like mobile devices.

Furthermore, the paper explores the effects of pre-training on pruning outcomes, revealing that pruned models based on pre-trained weights tend to be homogeneous. This homogeneity restricts the exploration of potentially more advantageous architectures. In contrast, the pruning from scratch method, by leveraging random initializations, capitalizes on a broader search space, yielding a variety of pruned network configurations.

From a practical perspective, this research provides a more computationally economical alternative for network pruning, which facilitates faster development cycles and reduced energy consumption. Theoretically, it invites the community to reassess entrenched beliefs around the necessity of pre-training in pruning. The implications are far-reaching, suggesting potential shifts in how neural architectures are optimized for efficiency.

The paper also sparks questions about the broader applicability of this approach, such as its effectiveness across different model types and tasks beyond image classification. Future research could explore these avenues and potentially integrate this paradigm within larger frameworks for automated neural architecture search (NAS).

In conclusion, "Pruning from Scratch" presents a compelling case for rethinking traditional network pruning strategies. This method avails a broadened horizon for designing efficient models without the prohibitive costs of conventional approaches, setting a new benchmark for computational efficiency in the design and deployment of neural network models.

PDF Markdown Bookmark Chat (Pro)

Authors (7)

Yulong Wang (58 papers)
Xiaolu Zhang (39 papers)
Lingxi Xie (137 papers)
Jun Zhou (370 papers)
Hang Su (224 papers)
Bo Zhang (633 papers)
Xiaolin Hu (97 papers)

Citations (183)

View on Semantic Scholar

Pruning from Scratch (1909.12579v1)

Pruning from Scratch: A New Perspective on Network Pruning

Related Papers