Pruning Filter in Filter (2009.14410v3)

Published 30 Sep 2020 in cs.CV

Abstract: Pruning has become a very powerful and effective technique to compress and accelerate modern neural networks. Existing pruning methods can be grouped into two categories: filter pruning (FP) and weight pruning (WP). FP wins at hardware compatibility but loses at the compression ratio compared with WP. To converge the strength of both methods, we propose to prune the filter in the filter. Specifically, we treat a filter $F \in \mathbb{R}^{C\times K\times K}$ as $K \times K$ stripes, i.e., $1\times 1$ filters $\in \mathbb{R}^{C}$, then by pruning the stripes instead of the whole filter, we can achieve finer granularity than traditional FP while being hardware friendly. We term our method as SWP (\emph{Stripe-Wise Pruning}). SWP is implemented by introducing a novel learnable matrix called Filter Skeleton, whose values reflect the shape of each filter. As some recent work has shown that the pruned architecture is more crucial than the inherited important weights, we argue that the architecture of a single filter, i.e., the shape, also matters. Through extensive experiments, we demonstrate that SWP is more effective compared to the previous FP-based methods and achieves the state-of-art pruning ratio on CIFAR-10 and ImageNet datasets without obvious accuracy drop. Code is available at https://github.com/fxmeng/Pruning-Filter-in-Filter

Citations (91)

View on Semantic Scholar

Summary

The paper presents a novel Stripe-Wise Pruning (SWP) technique that prunes filters at the stripe level, enhancing precision and reducing overall model complexity.
It employs a learnable Filter Skeleton to guide the pruning process, allowing selective removal of less impactful stripes without extensive fine-tuning.
Experiments on architectures like VGG16 and ResNet18 demonstrate over 90% parameter reduction and FLOPs savings while maintaining competitive accuracy.

Overview of "Pruning Filter in Filter"

This paper presents a novel approach to neural network pruning, specifically focusing on a method referred to as Stripe-Wise Pruning (SWP). Traditional pruning methods have primarily centered around two strategies: weight pruning (WP) and filter pruning (FP). WP targets individual weights to create sparsity, while FP works at the level of filters or channels, offering better performance in hardware but less compression. The proposed SWP method attempts to synthesize the advantages of both techniques by introducing a more granular approach to filter pruning.

Technical Contribution

The SWP method regards each filter in a neural network as comprising multiple stripes—essentially treating these stripes as independent $1\times 1$ filters. This finer granularity allows for more precise pruning, which the authors argue is beneficial both for performance and hardware compatibility. The innovative aspect of their approach is that it allows pruning of less impactful stripes within a filter, rather than removing an entire filter outright.

To implement this, the authors introduce a mechanism called the Filter Skeleton (FS), a learnable matrix that captures the shape of each filter and guides the pruning process. The FS matrix is designed to be trained alongside the weights of the network, and its sparsity is encouraged through regularization. Once trained, the FS helps identify less important stripes that can be pruned to reduce computational load without significant performance degradation.

Experimental Validation

Extensive experiments are conducted to validate the efficacy of the SWP approach. The results demonstrate that SWP achieves superior pruning ratios on datasets such as CIFAR-10 and ImageNet, surpassing existing FP-based methods in terms of both parameter reduction and FLOPs reduction, while maintaining competitive accuracy levels. Notably, the method allows for such reductions without necessitating extensive fine-tuning processes that are typically required when using traditional FP-based methods.

In the experimental setup, the authors detail several backbone neural network architectures, such as VGG16 and ResNet18, which serve as testbeds for demonstrating SWP’s capability to prune efficiently. Empirical results suggest that SWP can prune over 90% of parameters in these networks with negligible impact on accuracy, illustrating the potential of SWP to facilitate efficient deployment of deep networks in resource-constrained environments.

Implications and Future Directions

The development of the SWP methodology has significant implications for the deployment of deep neural networks in scenarios where computational resources and power efficiency are critical, such as mobile devices and edge computing environments. The ability to achieve a substantial reduction in model size and computational load while retaining high classification performance is highly valuable.

From a theoretical standpoint, the SWP approach highlights an often-overlooked aspect of network architecture—the internal structure of filters—and posits that optimizing at such a micro-level can yield substantial gains. This opens up avenues for further research in exploring other granular levels of pruning and optimization, possibly integrating more dynamically adjustable architectures that adaptively learn optimal configurations tailored to specific tasks.

Overall, the "Pruning Filter in Filter" methodology presented in this paper contributes a significant advancement in neural network pruning, providing a new perspective on balancing the trade-offs between model complexity and computational efficiency. As AI continues to permeate more constrained operational contexts, innovations such as SWP will likely play a crucial role in the next generation of neural network architectures.