Pruning Filters for Efficient ConvNets (1608.08710v3)

Published 31 Aug 2016 in cs.CV and cs.LG

Abstract: The success of CNNs in various applications is accompanied by a significant increase in the computation and parameter storage costs. Recent efforts toward reducing these overheads involve pruning and compressing the weights of various layers without hurting original accuracy. However, magnitude-based pruning of weights reduces a significant number of parameters from the fully connected layers and may not adequately reduce the computation costs in the convolutional layers due to irregular sparsity in the pruned networks. We present an acceleration method for CNNs, where we prune filters from CNNs that are identified as having a small effect on the output accuracy. By removing whole filters in the network together with their connecting feature maps, the computation costs are reduced significantly. In contrast to pruning weights, this approach does not result in sparse connectivity patterns. Hence, it does not need the support of sparse convolution libraries and can work with existing efficient BLAS libraries for dense matrix multiplications. We show that even simple filter pruning techniques can reduce inference costs for VGG-16 by up to 34% and ResNet-110 by up to 38% on CIFAR10 while regaining close to the original accuracy by retraining the networks.

PDF Abstract

Pruning Filters for Efficient ConvNets

Introduction

The paper "Pruning Filters for Efficient ConvNets" addresses the significant computational and storage costs associated with convolutional neural networks (CNNs), which have grown more complex and voluminous over time. The primary objective is to provide a method to prune entire filters from CNNs, thus reducing computational overhead without severely impacting the accuracy. This work presents a structured pruning approach that avoids the irregular sparsity patterns associated with magnitude-based weight pruning.

Methodology

The proposed method focuses on reducing the number of filters in the convolutional layers rather than pruning individual weights. By removing entire filters and their corresponding feature maps, the computational costs can be reduced more straightforwardly. This approach leverages existing efficient BLAS libraries for dense matrix operations, avoiding the need for specialized sparse convolution libraries.

The process involves calculating the $\ell_1$ -norm of each filter's weights to determine their relative importance. Filters with smaller $\ell_1$ -norms, which are expected to produce weak activations, are pruned. This strategy allows for a balanced reduction in computational load without the need for sparse data structures. The method is applied to different models, including VGG-16 on CIFAR-10 and ResNet variants on CIFAR-10 and ImageNet.

Results

The empirical results demonstrate that the proposed filter pruning method can achieve significant reductions in FLOP—up to 34% for VGG-16 and up to 38.6% for ResNet-110—while maintaining accuracy levels close to the original models. For example, VGG-16 on CIFAR-10 retains its performance with only a 0.15% drop in accuracy after a 34.2% reduction in FLOP. Similarly, ResNet-110 shows a minimal drop in accuracy with a substantial FLOP reduction.

The paper also highlights the relative sensitivity of different layers to pruning. Layers closer to the input and output are generally more sensitive, necessitating a careful selection of pruning rates. Additionally, the method's effectiveness was validated against other pruning criteria, such as activation-based feature map pruning, using criteria including mean-std and variance-based measures.

Implications

The practical implications of this research are significant, particularly for deploying CNNs in resource-constrained environments like mobile devices and embedded systems. The reduction in computational costs can lead to more efficient real-time applications and lower energy consumption.

Theoretically, this work contributes to understanding the inherent redundancies in deep networks and provides a structured approach to reducing these redundancies. This method could also be integrated with other model compression techniques like quantization and binarization to further enhance efficiency.

Future Directions

Future research could explore automated methods for determining optimal pruning rates across different layers and network architectures. Additionally, the approach could be extended to other neural network architectures beyond CNNs, such as recurrent neural networks (RNNs) or transformers.

Moreover, combining filter pruning with other techniques like network distillation and low-rank approximations could yield even more compact and efficient models. Investigating the impact of different regularization techniques during training to facilitate easier pruning could also be a fruitful area of paper.

Conclusion

The paper "Pruning Filters for Efficient ConvNets" presents a structured and effective method for reducing the computational demands of CNNs by pruning entire filters. This approach avoids the complexities associated with sparse operations and demonstrates significant FLOP reductions with minimal impact on accuracy. The results underscore the potential for deploying compressed models in real-time applications, suggesting extensive future research directions to enhance neural network efficiency further.

PDF Markdown Bookmark Chat (Pro)

Authors (5)

Hao Li (803 papers)
Asim Kadav (22 papers)
Igor Durdanovic (1 paper)
Hanan Samet (10 papers)
Hans Peter Graf (9 papers)

Citations (3,504)

View on Semantic Scholar

Related Papers

Find Related Papers

YouTube

Show All Videos