Group Sparsity: The Hinge Between Filter Pruning and Decomposition for Network Compression (2003.08935v1)

Published 19 Mar 2020 in cs.CV

Abstract: In this paper, we analyze two popular network compression techniques, i.e. filter pruning and low-rank decomposition, in a unified sense. By simply changing the way the sparsity regularization is enforced, filter pruning and low-rank decomposition can be derived accordingly. This provides another flexible choice for network compression because the techniques complement each other. For example, in popular network architectures with shortcut connections (e.g. ResNet), filter pruning cannot deal with the last convolutional layer in a ResBlock while the low-rank decomposition methods can. In addition, we propose to compress the whole network jointly instead of in a layer-wise manner. Our approach proves its potential as it compares favorably to the state-of-the-art on several benchmarks.

PDF Abstract

Network Compression with Group Sparsity: A Unified Approach to Pruning and Decomposition

Neural network compression is a critical step in optimizing convolutional neural networks (CNNs), especially for deployment on resource-limited devices. Among various compression strategies, filter pruning and low-rank decomposition are prominent for reducing computational demands. This paper examines these techniques from a unified perspective, leveraging group sparsity as the hinge that connects and amalgamates both methods into a combined approach.

Unified Framework for Compression

The authors propose a unified framework that integrates filter pruning and filter decomposition through the application of group sparsity regularization on neural networks. Traditionally, pruning involves the selective removal of less influential filters to streamline model architecture. Conversely, decomposition focuses on approximating heavier convolutional layers with compact formulations by breaking them down into component operations. The novel contribution here lies in recognizing the shared objectives of both approaches—approximating neural network weights while maintaining performance—and aligning them under group sparsity constraints.

Group Sparsity as a Versatile Mechanism

Sparsity is induced by a matrix $\mathbf{A}$ which is appended to stored convolutions as a regularizer. By adjusting the regularization enforced on rows versus columns of this matrix, the paper demonstrates the feasibility of toggling between pruning and decomposition effects. Column sparsity leads to filter pruning, whereas row sparsity aligns with low-rank decomposition. This dual capability offers significant flexibility for configuring compression techniques based on per-layer considerations, such as the existence of shortcut connections in architectures like ResNet.

Method and Implementation

The optimization problem is tackled using proximal gradient descent—a technique well-suited for handling non-smooth regularization tasks. The paper details an algorithm that methodically adjusts learning rates based on gradient magnitudes and employs binary search for refining sparsity thresholds post-compression. Such technical implementations assure balanced layer compression and robust proximal solutions that adhere to target metrics such as FLOPs and parameter reductions.

Experimental validation on prevalent network architectures (VGG, ResNet, DenseNet, and others) reveals competitive performance of the proposed method against state-of-the-art techniques. The approach consistently delivers reduced error rates and parameter requirements across CIFAR and ImageNet datasets, substantiating its practical efficacy.

Implications and Future Directions

The implications are two-fold: theoretical and practical. Theoretically, this paper invites further exploration into the combined use of network compression techniques by formulating a new conceptual model for understanding sparsity and decomposition. Practically, the introduction of group sparsity as a hinge between pruning and decomposition could serve as a blueprint for developing more adaptable and efficient compression tools.

The authors speculate that future advancements may involve extending these principles to other types of networks, such as transformer architectures, and considering additional nuances of model deployment environments, including hardware constraints. This raises important questions about the criteria for optimizing compression strategies to ensure both computational efficiency and operational fidelity across varied contexts.

In summary, this research offers a compelling method for network compression, enlarging the toolkit available to practitioners in the field, and carving pathways toward progressively sophisticated neural network optimization strategies.

PDF Markdown Bookmark Chat (Pro)

Authors (5)

Yawei Li (72 papers)
Shuhang Gu (56 papers)
Christoph Mayer (12 papers)
Luc Van Gool (569 papers)
Radu Timofte (299 papers)

Citations (180)

View on Semantic Scholar