Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Pruning Convolutional Neural Networks for Resource Efficient Inference (1611.06440v2)

Published 19 Nov 2016 in cs.LG and stat.ML

Abstract: We propose a new formulation for pruning convolutional kernels in neural networks to enable efficient inference. We interleave greedy criteria-based pruning with fine-tuning by backpropagation - a computationally efficient procedure that maintains good generalization in the pruned network. We propose a new criterion based on Taylor expansion that approximates the change in the cost function induced by pruning network parameters. We focus on transfer learning, where large pretrained networks are adapted to specialized tasks. The proposed criterion demonstrates superior performance compared to other criteria, e.g. the norm of kernel weights or feature map activation, for pruning large CNNs after adaptation to fine-grained classification tasks (Birds-200 and Flowers-102) relaying only on the first order gradient information. We also show that pruning can lead to more than 10x theoretical (5x practical) reduction in adapted 3D-convolutional filters with a small drop in accuracy in a recurrent gesture classifier. Finally, we show results for the large-scale ImageNet dataset to emphasize the flexibility of our approach.

An Overview of Pruning Convolutional Neural Networks for Resource Efficient Inference

The paper "Pruning Convolutional Neural Networks for Resource Efficient Inference" introduces a novel approach for reducing the computational demands of convolutional neural networks (CNNs) through parameter pruning, with a specific focus on convolutional kernels. The proposed methodology interlaces a greedy, criterion-based pruning with backpropagation-based fine-tuning. This synthesis aims to maintain the generalization capacity of the network post-pruning.

Key Contributions and Methodology

A pivotal contribution of the paper is the development of a pruning criterion grounded in the Taylor expansion. This criterion estimates the potential change in the cost function resulting from the pruning of network parameters, leveraging first-order gradient information for efficient evaluation. The research is primarily centered on transfer learning, specifically the adaptation of pretrained large networks to specialized, fine-grained classification tasks such as Birds-200 and Flowers-102 datasets.

The pruning procedure is iterative. Initially, it involves fine-tuning the network to convergence on the target task. Subsequently, the network undergoes successive cycles of pruning and fine-tuning to reach a predefined balance between performance accuracy and resource utilization metrics, such as floating point operations (FLOPs) and memory usage.

Numerical Results and Analysis

Empirically, the Taylor expansion-based criterion outperforms traditional pruning criteria, such as those based on kernel weight norms or feature map activations. Notably, the paper presents results demonstrating a theoretical reduction exceeding tenfold in 3D-convolutional filters used for gesture classification, with only a marginal decrease in accuracy. For instance, when applied to datasets using large networks like the ones pretrained on ImageNet, the method yielded significant resource savings, affirming the broad applicability and robustness of the approach.

Implications and Future Directions

From a practical standpoint, the method proposed holds substantial promise for deploying deep learning models on resource-constrained environments, such as embedded devices, without significantly compromising predictive accuracy. Theoretically, it enriches our understanding of parameter importance within CNNs, particularly in how sparsity can be effectively induced and managed through structured pruning.

Looking forward, the intersection of neural network pruning and hardware advancements suggests an avenue for developing specialized processors that can further harness the sparsity induced by pruning. Additionally, extending this approach to include other neural network architectures, such as transformers, could offer further insights into generalized pruning strategies across varying model types.

In conclusion, the paper contributes a coherent and empirically validated approach to network pruning that not only advances computational efficiency but also maintains the integrity and performance of the target tasks, setting the stage for further exploration and innovation in resource-efficient deep learning models.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Pavlo Molchanov (70 papers)
  2. Stephen Tyree (29 papers)
  3. Tero Karras (26 papers)
  4. Timo Aila (23 papers)
  5. Jan Kautz (215 papers)
Citations (410)