An Algorithm for Channel Pruning in Deep Convolutional Neural Networks
The paper presents a methodical approach to channel pruning for convolutional neural networks (CNNs), focusing primarily on computational efficiency without substantially compromising model performance. The discussion begins with the formulation of the channel pruning algorithm for a single convolutional layer and extends this to the entire model. The methodology also adapts to multi-branch networks such as GoogLeNet and ResNet.
Formulation and Subproblem Solutions
The channel pruning algorithm targets the reduction of channels in the feature map while maintaining the integrity of the outputs. The primary crux of the algorithm is to judiciously select channels and reconstruct feature maps accordingly. The optimization problem is inherently NP-hard, so the authors propose a relaxation from to regularization.
- Channel Selection Subproblem: The paper employs the Least Absolute Shrinkage and Selection Operator (LASSO) to determine the channels to prune.
- Reconstruction Subproblem: The remaining channels are utilized to reconstruct the feature map, achieved through a linear least squares method.
These subproblems are tackled iteratively. Initial channels are selected without penalty, and $\lalpha$, the penalty coefficient, is incrementally increased to introduce sparsity in the channel selection vector $\lcoef$. This iteration continues until stabilization.
Whole Model Pruning
The model extends the single-layer pruning approach to the entire network, handling one layer at a time and recalculating the accumulated error sequentially. This approach is efficient for single-branch networks like AlexNet and VGG Nets. However, multi-branch networks such as ResNet present additional challenges, particularly with the pruning of residual structures.
Multi-Branch Network Considerations
For multi-branch networks, the authors propose novel adaptations:
- Last Layer of Residual Branch: The output is optimized to approximate the sum of the shortcut and residual branches. This is necessary because the shortcut is parameter-free, making direct recovery infeasible.
- First Layer of Residual Branch: The authors introduce sampling before the first convolution to maintain computational efficiency without pruning channels shared with the shortcut branch.
Additionally, a filter-wise approach is proposed for the initial convolution on the residual branch. This method prunes input channels independently for each filter, resulting in "irregular" convolutions but achieving higher accuracy without fine-tuning.
Practical and Theoretical Implications
The methodology proposed shows significant potential for reducing computational costs in CNNs while retaining model performance. The distinction from other works is noteworthy: while previous methods have incorporated sparsity regularization into training loss, this approach explicitly solves the LASSO problem and considers feature maps during optimization. This makes the method applicable at inference time, broadening its utility.
Theoretical implications include a deeper understanding of channel redundancy in deep networks and the potential for further refinement in model compression techniques. Practically, this method can lead to more efficient deployment of neural networks on resource-constrained devices such as mobile phones and edge computing platforms.
Future Developments in AI
The adaptability of this method to both single and multi-branch networks suggests a trajectory towards more robust and versatile model compression strategies. Future research might focus on automated determination of the penalty coefficient $\lalpha$ and extending this approach to other network architectures, including transformer models. Enhanced library support for irregular convolutions, as suggested in the paper, could further improve the practicality and adoption of these techniques.
In summary, this paper provides a comprehensive approach to channel pruning, addressing both single-branch and multi-branch networks. The iterative optimization method grounded in LASSO and linear least squares promises significant advancements in efficient deep learning model deployment.