SNIP: Single-shot Network Pruning based on Connection Sensitivity (1810.02340v2)

Published 4 Oct 2018 in cs.CV and cs.LG

Abstract: Pruning large neural networks while maintaining their performance is often desirable due to the reduced space and time complexity. In existing methods, pruning is done within an iterative optimization procedure with either heuristically designed pruning schedules or additional hyperparameters, undermining their utility. In this work, we present a new approach that prunes a given network once at initialization prior to training. To achieve this, we introduce a saliency criterion based on connection sensitivity that identifies structurally important connections in the network for the given task. This eliminates the need for both pretraining and the complex pruning schedule while making it robust to architecture variations. After pruning, the sparse network is trained in the standard way. Our method obtains extremely sparse networks with virtually the same accuracy as the reference network on the MNIST, CIFAR-10, and Tiny-ImageNet classification tasks and is broadly applicable to various architectures including convolutional, residual and recurrent networks. Unlike existing methods, our approach enables us to demonstrate that the retained connections are indeed relevant to the given task.

Authors (3)

Namhoon Lee (19 papers)
Thalaiyasingam Ajanthan (33 papers)
Philip H. S. Torr (219 papers)

Citations (1,074)

View on Semantic Scholar

Summary

An Analysis of SNIP: Single-shot Network Pruning based on Connection Sensitivity

In the domain of neural network pruning, an efficient method for reducing network complexity without sacrificing performance is highly sought after. Existing methods mostly rely on iterative optimization procedures, which include prune-retrain cycles and involve non-trivial hyperparameters. Such approaches are computationally expensive and hinder the practical deployment of pruning techniques across various architectures. The paper "SNIP: Single-shot Network Pruning based on Connection Sensitivity" by Namhoon Lee, Thalaiyasingam Ajanthan, and Philip H. S. Torr presents a straightforward approach that prunes a network once at random initialization by computing the sensitivity of network connections to the task at hand, prior to training.

Methodology: Connection Sensitivity and Single-shot Pruning

SNIP introduces a novel criterion for identifying structurally important connections, termed as "connection sensitivity," which measures the influence of individual connections on the loss function. The method computes a saliency score for each connection by evaluating the absolute gradient of the loss with respect to the connection's presence, normalized across the network. This sensitivity measure helps in identifying the most critical connections for the given task, allowing redundant ones to be pruned in a single step.

The main advantage of this approach lies in its simplicity:

Pretraining and Complex Pruning Schedules: SNIP eliminates the need for pretraining and iterative pruning schedules. The network is pruned once at initialization, and the resulting sparse network is trained in the standard manner.
Architecture Robustness: Since the saliency criterion measures the structural importance of connections independently of their weights, it is robust across different architectures, including convolutional, residual, and recurrent networks.
Interpretability: By pruning based on data-dependent sensitivity, the method allows for introspection on which connections are most relevant for a given task.

Experimental Results

The paper evaluates SNIP on several image classification tasks, including MNIST, CIFAR-10, and Tiny-ImageNet, using various network architectures such as LeNet, AlexNet, VGG, and Wide ResNets. The results demonstrate that SNIP can achieve extreme sparsity levels with marginal or no loss in classification accuracy.

MNIST and CIFAR-10 Performance: For MNIST, the method shows that pruned networks can maintain performance across varying levels of sparsity with less than 1% increase in error rates even at 99% sparsity. For CIFAR-10, the results are similarly promising, with pruned VGG-like and WRN-22-8 architectures maintaining comparable or even better performance than their dense counterparts.
Generalizability Across Architectures: The authors provide strong empirical evidence of the versatility of SNIP by successfully applying it across convolutional, residual, and recurrent networks without the need for architecture-specific adjustments. This robustness is a noteworthy characteristic of the method.

Insights and Implications

A key contribution of SNIP is its ability to provide insights into the importance of connections from the very beginning of training. This data-driven criterion supports the hypothesis that overparameterized networks contain many redundant parameters that can be identified and pruned early on. The findings suggest that such pruned networks not only reduce computational and memory overhead but also enhance model interpretability by focusing on the most relevant parts of the data.

Moreover, the authors' exploration into the relevance of the retained connections confirmed that SNIP indeed prunes irrelevant ones while preserving those most crucial for the task. This also opens avenues for future research in domain adaptation, transfer learning, and structural regularization using sparse models derived via SNIP.

Future Directions

Future work may explore integrating SNIP with advanced optimization algorithms and extending its application to larger-scale datasets. Additionally, an in-depth theoretical analysis of the effects of early pruning on learning dynamics could further substantiate the empirical findings presented.

Conclusion

In summary, the SNIP approach offers a pragmatic and effective solution for neural network pruning. Its simplicity, versatility, and interpretability make it a valuable method for reducing network complexity while maintaining performance. The detailed experimental results substantiate SNIP's efficacy, highlighting its potential for broad applicability in machine learning applications.

PDF Markdown