Fast ConvNets Using Group-wise Brain Damage (1506.02515v2)

Published 8 Jun 2015 in cs.CV

Abstract: We revisit the idea of brain damage, i.e. the pruning of the coefficients of a neural network, and suggest how brain damage can be modified and used to speedup convolutional layers. The approach uses the fact that many efficient implementations reduce generalized convolutions to matrix multiplications. The suggested brain damage process prunes the convolutional kernel tensor in a group-wise fashion by adding group-sparsity regularization to the standard training process. After such group-wise pruning, convolutions can be reduced to multiplications of thinned dense matrices, which leads to speedup. In the comparison on AlexNet, the method achieves very competitive performance.

Citations (441)

View on Semantic Scholar

Summary

The paper introduces a group-wise pruning technique that strategically reduces convolution complexity while preserving overall network accuracy.
The paper integrates a group sparsity regularizer into SGD to optimize receptive fields and prune redundant feature maps.
The paper demonstrates experimental benchmarks with up to 8.5× speedup in key ConvNet layers accompanied by a minimal 1% drop in accuracy.

Analysis of "Fast ConvNets Using Group-wise Brain Damage"

The paper by Vadim Lebedev and Victor Lempitsky introduces a method for accelerating convolutional neural networks (ConvNets) by incorporating group-wise brain damage. This approach is grounded in the concept of pruning convolutional kernel tensor coefficients in a structured manner to achieve significant computational speedup with minimal accuracy loss.

Core Approach and Methodology

At the heart of this research is a modified form of the brain damage technique, initially proposed by LeCun et al., which prunes neural network coefficients to enhance efficiency. The authors propose a group-wise pruning mechanism tailored specifically to convolutional layers, mindful of the fact that many efficient ConvNet implementations reduce convolutions to matrix multiplications. By organizing kernel tensor entries into groups and zeroing those groups selectively, the authors achieve a reduction in convolutional complexity. This results in thinner but dense matrices, which accelerates matrix multiplications and thus the ConvNet as a whole.

The paper explores methods for integrating group-wise pruning into the learning process. It posits that this can optimize the shape of receptive fields and even prune superfluous feature maps within ConvNets. A significant contribution is the adoption of a group-sparsity regularizer within stochastic gradient descent, which facilitates this group-wise brain damage.

Experimental Results and Benchmarks

The authors conducted a series of experiments measuring the performance gains achieved through their proposed pruning strategies. Notably, they demonstrate that this approach can accelerate the bottleneck layers of popular architectures like AlexNet, achieving a speedup factor of up to 8.5 times for the 'conv2' and 'conv3' layers, with a modest accuracy drop of roughly 1%.

These results are compared against several recent tensor factorization-based methods. The proposed group-wise brain damage approach shows superior or comparable speedup, emphasizing the efficacy of structured sparsity over unstructured parameter reduction.

Practical and Theoretical Implications

Practically, this methodology presents a pathway for deploying efficient real-time and large-scale ConvNet applications where computational resources are a limiting factor. Theoretically, it underscores the potential of using structured sparsity as a mechanism for discovering optimal network architectures. The ability to automatically adjust convolutional layer connectivity based on input data further advances the field's understanding of dynamic model adaptation.

Future Directions and Developments

Given the promising results and implications, future work may consider extending these methodologies to more complex networks, such as VGGNet, where initial experiments already show potential. Additionally, extending the implementation to support GPU acceleration and backpropagation would be beneficial, as would exploring hierarchical group-sparsity regularizers' role in enhancing pruning strategies.

Overall, this paper contributes valuable insights and techniques for accelerating ConvNets, blending rigorous methodology with empirical validation, and opening up avenues for future research in efficient deep learning architectures.