Importance Estimation for Neural Network Pruning (1906.10771v1)

Published 25 Jun 2019 in cs.LG, cs.CV, and stat.ML

Abstract: Structural pruning of neural network parameters reduces computation, energy, and memory transfer costs during inference. We propose a novel method that estimates the contribution of a neuron (filter) to the final loss and iteratively removes those with smaller scores. We describe two variations of our method using the first and second-order Taylor expansions to approximate a filter's contribution. Both methods scale consistently across any network layer without requiring per-layer sensitivity analysis and can be applied to any kind of layer, including skip connections. For modern networks trained on ImageNet, we measured experimentally a high (>93%) correlation between the contribution computed by our methods and a reliable estimate of the true importance. Pruning with the proposed methods leads to an improvement over state-of-the-art in terms of accuracy, FLOPs, and parameter reduction. On ResNet-101, we achieve a 40% FLOPS reduction by removing 30% of the parameters, with a loss of 0.02% in the top-1 accuracy on ImageNet. Code is available at https://github.com/NVlabs/Taylor_pruning.

PDF Abstract

Insights into "Importance Estimation for Neural Network Pruning"

The paper, "Importance Estimation for Neural Network Pruning," presents a methodological advancement in network pruning by Pavlo Molchanov, Arun Mallya, Stephen Tyree, Iuri Frosio, and Jan Kautz from NVIDIA. The authors introduce a novel pruning criterion based on the Taylor expansion of the loss function, which estimates the importance of neurons (or filters) and iteratively removes the least significant ones.

Core Contributions

The authors present a significant improvement over existing pruning methodologies by offering the following key contributions:

Novel Pruning Criterion: The paper introduces a criterion based on first- and second-order Taylor expansions to approximate the impact of removing a filter. This approach leverages readily available gradient information during standard training.
Scalability: Unlike previous methods that required per-layer sensitivity analysis, the proposed method consistently scales across network layers and can be applied globally.
High Correlation with True Importance: For modern networks trained on benchmarks such as ImageNet, the proposed criterion exhibits a high correlation (>93\%) with an empirically reliable importance estimate.
Empirical Evidence: Experimental validation on networks like ResNet-101 shows a 40% reduction in FLOPs (Floating Point Operations) and a mere 0.02% loss in top-1 accuracy on ImageNet. This demonstrates state-of-the-art performance in terms of accuracy, computational efficiency, and parameter reduction.

Methodology

The method for importance estimation is grounded in the calculation of first- and second-order Taylor expansions of the loss function. Specifically:

First-Order Approximation (Taylor-FO): This is computed as $(g_m w_m)^2$ , where $g_m$ represents the gradient and $w_m$ the weight of the neuron.
Second-Order Approximation (Taylor-SO): This involves the diagonal of the Hessian and provides a more precise, yet computationally expensive, importance measure.

Experimental Results

CIFAR-10

Using ResNet-18, the paper demonstrates that both the Taylor-FO and Taylor-SO methods closely match the performance of a greedy oracle, which selects the best neuron to prune based on empirical reconstruction loss. Both variants achieve superior performance compared to the weight magnitude-based methods. A particularly significant result is the high Spearman correlation (up to 0.957) between the Taylor-FO method and the oracle, highlighting that the proposed criteria are robust proxies for neuron importance.

ImageNet

When applied to deeper networks like ResNet-101, VGG11-BN, and DenseNet-201 trained on ImageNet, the proposed methods outperform previous standards significantly. For instance, applying the Taylor-FO criterion led to a 0.02% reduction in top-1 accuracy on ResNet-101 while achieving a 40% reduction in FLOPs. This reduction in computational complexity without compromising performance is crucial for deployment on resource-constrained devices like mobile phones and IoT devices.

Implications

Practical Implications:

Efficiency in Deployment: The pruning methods can dramatically reduce inference time and storage requirements, making it feasible to deploy sophisticated models on devices with limited resources.
Energy Savings: The reduced computational load translates to lower energy consumption, which is vital for battery-powered devices.

Theoretical Implications:

Re-evaluation of Weight Magnitude: This work challenges the conventional reliance on weight magnitude for pruning decisions, showing that gradient-based criteria are more correlated with neuron importance.
Generalizability: The finding that Taylor-FO can be applied globally without per-layer sensitivity analysis simplifies the pruning process, making it more accessible and applicable to a wide range of network architectures.

Future Directions

Given these promising results, future research could further explore:

Adaptive Pruning Schedules: Developing dynamic pruning schedules that adapt based on intermediate training performance can potentially improve efficiency.
Integration with Quantization: Combining pruning with other model compression techniques such as quantization could yield more compact and efficient models.
Theoretical Analysis of Second-Order Methods: While this work primarily focuses on first-order methods due to computational constraints, detailed analysis and optimization of second-order methods might unlock further improvements.

In conclusion, the paper "Importance Estimation for Neural Network Pruning" provides substantial advancements in pruning techniques, balancing computational efficiency with model accuracy. Its novel gradient-based criteria, verified through extensive experiments, mark a noteworthy step forward in the practical implementation and theoretical understanding of network pruning.

PDF Markdown Bookmark Chat (Pro)

Authors (5)

Pavlo Molchanov (70 papers)
Arun Mallya (25 papers)
Stephen Tyree (29 papers)
Iuri Frosio (17 papers)
Jan Kautz (215 papers)

Citations (804)

View on Semantic Scholar

Related Papers

Find Related Papers

GitHub

GitHub - NVlabs/Taylor_pruning: Pruning Neural Networks with Taylor criterion in Pytorch (315 stars)