Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

An Entropy-based Pruning Method for CNN Compression (1706.05791v1)

Published 19 Jun 2017 in cs.CV

Abstract: This paper aims to simultaneously accelerate and compress off-the-shelf CNN models via filter pruning strategy. The importance of each filter is evaluated by the proposed entropy-based method first. Then several unimportant filters are discarded to get a smaller CNN model. Finally, fine-tuning is adopted to recover its generalization ability which is damaged during filter pruning. Our method can reduce the size of intermediate activations, which would dominate most memory footprint during model training stage but is less concerned in previous compression methods. Experiments on the ILSVRC-12 benchmark demonstrate the effectiveness of our method. Compared with previous filter importance evaluation criteria, our entropy-based method obtains better performance. We achieve 3.3x speed-up and 16.64x compression on VGG-16, 1.54x acceleration and 1.47x compression on ResNet-50, both with about 1% top-5 accuracy decrease.

An Entropy-based Pruning Method for CNN Compression

The paper "An Entropy-based Pruning Method for CNN Compression" by Luo and Wu explores a novel approach to compress and accelerate Convolutional Neural Networks (CNNs) through a filter pruning strategy that leverages an entropy-based criterion. Their methodology addresses key challenges associated with deploying heavy CNN models on resource-constrained devices, such as mobile phones, by reducing both the computational and storage demands of these networks while maintaining performance standards.

The core proposal in this paper is the use of entropy as a metric to evaluate the importance of convolutional filters in CNNs. The essential premise is that filters contributing less information, as determined by lower entropy scores, can be pruned with minimal impact on network performance. The pruning method proceeds iteratively, followed by a fine-tuning process to recover the network's generalization ability lost during filter removal. This iterative pruning is shown to be more effective than pruning all at once, which can significantly degrade model performance.

In terms of performance, the entropy-based pruning method demonstrates substantial improvements. On the ILSVRC-12 benchmark, it achieves a 3.3x acceleration and a 16.64x compression in VGG-16 while incurring only around a 1% drop in top-5 accuracy. For ResNet-50, which naturally has less redundancy, the method achieves a 1.54x speed-up and a 1.47x compression with a similar minor impact on top-5 accuracy. These results highlight the efficiency of the entropy-based pruning method in outperforming criteria previously used for filter importance evaluation.

One of the unique aspects highlighted by the authors is the reduction of the size of intermediate activations, a factor typically overlooked in prior compression methods. By addressing this component, the approach not only compresses the model parameters but also reduces the run-time memory footprint, which is critical for miniaturizing models for edge devices.

Beyond its practical implications, this work also contributes to the theoretical understanding of model compression. The use of entropy as a selection metric offers a new perspective on filter redundancy in CNN architectures, suggesting potential directions for further exploration in model optimization techniques.

Future developments stemming from this research could include extending the entropy-based approach to dynamic network pruning or integrating it with other compression techniques such as parameter quantization and network binarization. Moreover, exploring the applicability of this method in other vision tasks beyond classification, such as object detection or semantic segmentation, could considerably broaden the impact of the proposed compression strategy.

In conclusion, Luo and Wu have presented a robust method for reducing the computational burden of CNNs, promoting more accessible deployment on devices with constrained resources, without significant loss in performance. This work represents a significant step in the ongoing exploration of model compression techniques and has substantial implications for the deployment of AI in mobile and embedded systems.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Jian-Hao Luo (7 papers)
  2. Jianxin Wu (82 papers)
Citations (170)