AutoPruner: An End-to-End Trainable Filter Pruning Method for Efficient Deep Model Inference (1805.08941v3)

Published 23 May 2018 in cs.CV

Abstract: Channel pruning is an important family of methods to speed up deep model's inference. Previous filter pruning algorithms regard channel pruning and model fine-tuning as two independent steps. This paper argues that combining them into a single end-to-end trainable system will lead to better results. We propose an efficient channel selection layer, namely AutoPruner, to find less important filters automatically in a joint training manner. Our AutoPruner takes previous activation responses as an input and generates a true binary index code for pruning. Hence, all the filters corresponding to zero index values can be removed safely after training. We empirically demonstrate that the gradient information of this channel selection layer is also helpful for the whole model training. By gradually erasing several weak filters, we can prevent an excessive drop in model accuracy. Compared with previous state-of-the-art pruning algorithms (including training from scratch), AutoPruner achieves significantly better performance. Furthermore, ablation experiments show that the proposed novel mini-batch pooling and binarization operations are vital for the success of filter pruning.

PDF Abstract

AutoPruner: An End-to-End Trainable Filter Pruning Method for Efficient Deep Model Inference

The proliferation of deep learning models, while achieving impressive results across numerous domains, has been accompanied by increasing concerns about their computational inefficiency. Particularly, deploying large models in resource-constrained environments poses significant challenges. The paper "AutoPruner: An End-to-End Trainable Filter Pruning Method for Efficient Deep Model Inference" introduces a novel approach to tackle these challenges by refining the process of model pruning.

AutoPruner distinguishes itself from traditional pruning methodologies by integrating the pruning process with model fine-tuning into a single, cohesive training framework. This approach deviates from the conventional three-stage pruning pipeline, which separates model pre-training, filter pruning, and fine-tuning. By synthesizing these steps, AutoPruner is purported to yield models with reduced size and computational complexity while maintaining, or even enhancing, accuracy.

At the core of AutoPruner is a channel selection layer designed to identify and remove less significant filters during the training phase itself. This involves taking previous activation responses as inputs and generating a binary index code to dictate which filters can be safely removed post-training. The use of binary coding ensures that the pruning process does not impair the structural integrity of the original model. Importantly, the gradient information from this selection layer is utilized during training, effectively guiding the pruning process and refining the model's feature representation capabilities.

This research presents substantial quantitative evidence supporting the efficacy of AutoPruner. Evaluations conducted on fine-grained datasets like CUB200-2011 and large-scale datasets like ImageNet ILSVRC-12 demonstrate its capability to outperform existing state-of-the-art pruning methods in terms of both accuracy and FLOPs (Floating Point Operations). For instance, on the CUB200-2011 with a compression rate of 0.5, AutoPruner offers a notable improvement in top-1 accuracy over methods like ThiNet. Similarly, AutoPruner exhibits superior performance in pruning ResNet-50 models on the ImageNet dataset when compared to other contemporary techniques.

The paper also delineates the critical components within the AutoPruner architecture. This includes the novel use of mini-batch pooling and binarization processes, which are crucial to the success of the pruning operation. The binarization, in particular, allows the generated index code to converge to binary values during fine-tuning, simplifying the removal of redundant channels. Furthermore, a carefully crafted loss function ensures the model achieves a predefined compression ratio while allowing for adaptability based on accuracy requirements.

AutoPruner's methodology and results emphasize the potential for more adaptive and efficient ways to achieve model compression. The research suggests that using information flow from uncompressed to compressed layers allows for a more intuitive selection of filters, promoting an intricate symbiosis between pruning and training. This end-to-end trainable framework not only maintains the integrity of learned features but also enhances generalization across diverse tasks and datasets.

Looking ahead, the implications of this research extend to a broader scope of applications beyond image classification, such as object detection and semantic segmentation, where inference efficiency is crucial. Moreover, the AutoPruner approach prompts further investigations into the balance between model compactness and accuracy, paving the way for innovations in how deep learning models are deployed in real-time applications and on devices with limited computational resources.

In conclusion, AutoPruner marks an advancement in model acceleration techniques within deep learning, offering a robust framework that promises efficiency without compromising on performance. This refined integration of pruning and fine-tuning holds potential to redefine best practices in model training and deployment, with significant impacts on the design of future neural network architectures.

PDF Markdown Bookmark Chat (Pro)

Authors (2)

Jian-Hao Luo (7 papers)
Jianxin Wu (82 papers)

Citations (197)

View on Semantic Scholar

Related Papers

Find Related Papers

YouTube

Show All Videos