Channel Pruning for Accelerating Very Deep Neural Networks (1707.06168v2)

Published 19 Jul 2017 in cs.CV

Abstract: In this paper, we introduce a new channel pruning method to accelerate very deep convolutional neural networks.Given a trained CNN model, we propose an iterative two-step algorithm to effectively prune each layer, by a LASSO regression based channel selection and least square reconstruction. We further generalize this algorithm to multi-layer and multi-branch cases. Our method reduces the accumulated error and enhance the compatibility with various architectures. Our pruned VGG-16 achieves the state-of-the-art results by 5x speed-up along with only 0.3% increase of error. More importantly, our method is able to accelerate modern networks like ResNet, Xception and suffers only 1.4%, 1.0% accuracy loss under 2x speed-up respectively, which is significant. Code has been made publicly available.

PDF Abstract

An Algorithm for Channel Pruning in Deep Convolutional Neural Networks

The paper presents a methodical approach to channel pruning for convolutional neural networks (CNNs), focusing primarily on computational efficiency without substantially compromising model performance. The discussion begins with the formulation of the channel pruning algorithm for a single convolutional layer and extends this to the entire model. The methodology also adapts to multi-branch networks such as GoogLeNet and ResNet.

Formulation and Subproblem Solutions

The channel pruning algorithm targets the reduction of channels in the feature map while maintaining the integrity of the outputs. The primary crux of the algorithm is to judiciously select channels and reconstruct feature maps accordingly. The optimization problem is inherently NP-hard, so the authors propose a relaxation from $\ell_0$ to $\ell_1$ regularization.

Channel Selection Subproblem: The paper employs the Least Absolute Shrinkage and Selection Operator (LASSO) to determine the channels to prune.
Reconstruction Subproblem: The remaining channels are utilized to reconstruct the feature map, achieved through a linear least squares method.

These subproblems are tackled iteratively. Initial channels are selected without penalty, and $\lalpha$, the penalty coefficient, is incrementally increased to introduce sparsity in the channel selection vector $\lcoef$. This iteration continues until stabilization.

Whole Model Pruning

The model extends the single-layer pruning approach to the entire network, handling one layer at a time and recalculating the accumulated error sequentially. This approach is efficient for single-branch networks like AlexNet and VGG Nets. However, multi-branch networks such as ResNet present additional challenges, particularly with the pruning of residual structures.

Multi-Branch Network Considerations

For multi-branch networks, the authors propose novel adaptations:

Last Layer of Residual Branch: The output is optimized to approximate the sum of the shortcut and residual branches. This is necessary because the shortcut is parameter-free, making direct recovery infeasible.
First Layer of Residual Branch: The authors introduce sampling before the first convolution to maintain computational efficiency without pruning channels shared with the shortcut branch.

Additionally, a filter-wise approach is proposed for the initial convolution on the residual branch. This method prunes input channels independently for each filter, resulting in "irregular" convolutions but achieving higher accuracy without fine-tuning.

Practical and Theoretical Implications

The methodology proposed shows significant potential for reducing computational costs in CNNs while retaining model performance. The distinction from other works is noteworthy: while previous methods have incorporated sparsity regularization into training loss, this approach explicitly solves the LASSO problem and considers feature maps during optimization. This makes the method applicable at inference time, broadening its utility.

Theoretical implications include a deeper understanding of channel redundancy in deep networks and the potential for further refinement in model compression techniques. Practically, this method can lead to more efficient deployment of neural networks on resource-constrained devices such as mobile phones and edge computing platforms.

Future Developments in AI

The adaptability of this method to both single and multi-branch networks suggests a trajectory towards more robust and versatile model compression strategies. Future research might focus on automated determination of the penalty coefficient $\lalpha$ and extending this approach to other network architectures, including transformer models. Enhanced library support for irregular convolutions, as suggested in the paper, could further improve the practicality and adoption of these techniques.

In summary, this paper provides a comprehensive approach to channel pruning, addressing both single-branch and multi-branch networks. The iterative optimization method grounded in LASSO and linear least squares promises significant advancements in efficient deep learning model deployment.

PDF Markdown Bookmark Chat (Pro)

Authors (3)

Yihui He (25 papers)
Xiangyu Zhang (328 papers)
Jian Sun (414 papers)

Citations (2,426)

View on Semantic Scholar