Dynamic Model Pruning with Feedback (2006.07253v1)

Published 12 Jun 2020 in cs.LG and stat.ML

Abstract: Deep neural networks often have millions of parameters. This can hinder their deployment to low-end devices, not only due to high memory requirements but also because of increased latency at inference. We propose a novel model compression method that generates a sparse trained model without additional overhead: by allowing (i) dynamic allocation of the sparsity pattern and (ii) incorporating feedback signal to reactivate prematurely pruned weights we obtain a performant sparse model in one single training pass (retraining is not needed, but can further improve the performance). We evaluate our method on CIFAR-10 and ImageNet, and show that the obtained sparse models can reach the state-of-the-art performance of dense models. Moreover, their performance surpasses that of models generated by all previously proposed pruning schemes.

PDF Abstract

Dynamic Model Pruning with Feedback

The paper "Dynamic Model Pruning with Feedback" introduces a novel approach to model compression designed to address the challenges associated with deploying large-scale neural networks on resource-constrained devices. The proposed method, termed Dynamic Pruning with Feedback (DPF), dynamically adjusts the sparsity of neural networks during training through a feedback mechanism that reactivates weights that are prematurely pruned. This methodology signifies a departure from existing pruning strategies by avoiding the need for retraining, and it performs competitively against state-of-the-art dense models and surpasses the effectiveness of previously established pruning techniques.

Key Contributions

Dynamic Sparsity Pattern Allocation: The core of the proposed approach is its dynamic allocation to adjust pruning masks during the training process. By leveraging a feedback signal, it corrects the course of pruning by reactivating weights that would have been pruned out under strict one-shot criteria, thereby improving the total performance of the resulting sparse model.
Single-Pass Training: Unlike traditional pruning methods that typically involve training a dense model, pruning some weights, and then fine-tuning the pruned model (thus requiring multiple passes), the DPF method achieves its results in a single training run. This offers computational efficiency and reduces training complexity.
Empirical Evaluation on Established Datasets: The efficacy of DPF was evaluated on well-known datasets such as CIFAR-10 and ImageNet. It demonstrated that sparse models created using DPF achieve accuracy on par with dense models and exceed those generated by existing pruning methodologies.
Minimal Hyperparameter Tuning: A significantly advantageous feature of DPF is that it requires minimal hyperparameter tuning compared to other dynamic pruning methods like DSR or SM. This simplicity makes it more accessible and scalable across different architectures.

Theoretical Implications

The authors provide a convergence analysis for both convex and non-convex loss functions. For strongly convex problems, the paper establishes that DPF converges to a neighborhood of the optimal solution, where the error is bounded by a function of the sparsity pattern quality and training time. For non-convex problems, the analysis shows convergence to stationary points, again bounded by similar dynamics. This comprehensive theoretical foundation underscores the robustness of DPF as a general pruning framework that maintains the trajectory towards minimizing loss, despite aggressive sparsity imposition.

Empirical Results

Numerical results indicate that, on average, DPF provides state-of-the-art performance in terms of sparsity and prediction accuracy over a range of neural networks, such as ResNet and WideResNet, on CIFAR-10 and ImageNet datasets. For example, DPF achieved superior results for sparsity levels up to 99%, a point at which other methods fail to maintain competitive accuracy. By effectively balancing sparsity and accuracy, DPF offers practical advantages in deploying deep learning solutions on devices where memory and processing power are limited.

Future Directions

While the paper successfully demonstrates substantial progress in pruning methodologies, it opens several avenues for future research. Potential directions include extending DPF to structured pruning, exploring more complex feedback mechanisms, or integrating DPF within broader neural architecture search frameworks to optimize energy efficiency and latency on specific hardware platforms. Moreover, applying DPF to different types of neural network architectures, beyond those explored, could offer insights into its adaptability and generalizability across the machine learning landscape.

In conclusion, the Dynamic Pruning with Feedback method stands as a significant contribution to the field of neural network compression. Its innovative use of feedback for sparsity adjustment during training aligns well with the growing need for efficient, deployable AI models without sacrificing performance, positioning it as an attractive tool for further research and practical application.

PDF Markdown Bookmark Chat (Pro)

Authors (5)

Tao Lin (167 papers)
Sebastian U. Stich (66 papers)
Luis Barba (30 papers)
Daniil Dmitriev (8 papers)
Martin Jaggi (155 papers)

Citations (185)

View on Semantic Scholar

Dynamic Model Pruning with Feedback (2006.07253v1)