A Closer Look at Structured Pruning for Neural Network Compression (1810.04622v3)

Published 10 Oct 2018 in stat.ML, cs.CV, and cs.LG

Abstract: Structured pruning is a popular method for compressing a neural network: given a large trained network, one alternates between removing channel connections and fine-tuning; reducing the overall width of the network. However, the efficacy of structured pruning has largely evaded scrutiny. In this paper, we examine ResNets and DenseNets obtained through structured pruning-and-tuning and make two interesting observations: (i) reduced networks---smaller versions of the original network trained from scratch---consistently outperform pruned networks; (ii) if one takes the architecture of a pruned network and then trains it from scratch it is significantly more competitive. Furthermore, these architectures are easy to approximate: we can prune once and obtain a family of new, scalable network architectures that can simply be trained from scratch. Finally, we compare the inference speed of reduced and pruned networks on hardware, and show that reduced networks are significantly faster. Code is available at https://github.com/BayesWatch/pytorch-prunes.

PDF Abstract

An Examination of Structured Pruning in Neural Network Compression

The paper presented in "A Closer Look at Structured Pruning for Neural Network Compression" by Crowley et al. provides a nuanced exploration of structured pruning techniques applied to neural network compression. The central theme of this research is the evaluation of the true efficacy of structured pruning techniques used today, alongside observations that could potentially inform future directions in the field. Two key observations shape the narrative: reduced networks, i.e., smaller architectures trained from scratch, often outperform pruned networks, and networks designed from the architecture of a pruned network and retrained from scratch also demonstrate notable competitiveness.

Core Findings and Analysis

The researchers carried out a set of extensive pruning experiments utilizing popular network models like ResNets and DenseNets on datasets such as CIFAR-10 and ImageNet. Key findings include:

Performance of Pruned vs. Reduced Networks: The paper claims that, contrary to common practice, pruned-and-tuned models are consistently outperformed by reduced models trained from scratch within comparable parameter budgets. This holds especially true as networks are compressed to more extreme levels. Techniques such as $L_1$ pruning and the more sophisticated Fisher pruning were evaluated against reduced networks with varied structures (duffing depth, width, and bottlenecks).
Training Pruned Networks from Scratch: When architectures derived from structured pruning are retrained from scratch, they present significant performance improvements over their fine-tuned counterparts. This indicates that although pruning might be suboptimal for immediate compression and tuning, it can yield insights for effective architectural decisions.
Hardware Performance and Inference Acceleration: From a practical standpoint, the paper illustrates that structured pruning can have adverse effects on inference efficiency on general-purpose hardware. Reduced networks, mainly those configured by modified depth, showcase better computational throughput and faster inference on platforms like CPUs and GPUs, compared to structured pruned networks.
Practical Family Derivation of Architectures: A promising byproduct of their paper is the development of a scalable approach to derive new network families from pruned architectures. These "copycat" architectures display comparable performance to their Fisher-pruned counterparts and outperform alternatives like SNIP networks when trained from scratch.

Implications and Future Directions

The findings in this paper suggest a need to reconsider structured pruning methodologies as primarily architecture search tools rather than standalone effective compression strategies. It emphasizes the capability of reduced models to offer a simpler yet efficient alternative. For researchers and practitioners, this approach could lead to a paradigm shift, wherein attempts to compress models pivot towards systematic architectural adjustments instead of aggressive pruning and subsequent fine-tuning.

As computational efficiency remains a critical concern, especially for deployment in resource-constrained environments, this research highlights a potential void in optimized hardware-specific sparse network operations. The insights extend to cross-task applicability, where retrained architectures could be integrated seamlessly into diverse applications such as transfer learning and low-data regimes.

In conclusion, while structured pruning continues as a retrospective method for network optimization, the potential lies in approximating and leveraging its architectural outputs for forward design processes. This research advocates for more robust evaluations of pruning impacts, encouraging a balanced view between theoretical optimization and practical deployment efficiency. The call to open-source development offered by the authors may inspire further explorations that strategically balance accuracy, hardware utilization, and configurability in neural architectures.

PDF Markdown Bookmark Chat (Pro)

Authors (4)

Elliot J. Crowley (27 papers)
Jack Turner (9 papers)
Amos Storkey (75 papers)
Michael O'Boyle (15 papers)

Citations (30)

View on Semantic Scholar

Related Papers

Find Related Papers

GitHub

GitHub - BayesWatch/pytorch-prunes: Code for https://arxiv.org/abs/1810.04622 (140 stars)