Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Comparing Rewinding and Fine-tuning in Neural Network Pruning (2003.02389v1)

Published 5 Mar 2020 in cs.LG and stat.ML

Abstract: Many neural network pruning algorithms proceed in three steps: train the network to completion, remove unwanted structure to compress the network, and retrain the remaining structure to recover lost accuracy. The standard retraining technique, fine-tuning, trains the unpruned weights from their final trained values using a small fixed learning rate. In this paper, we compare fine-tuning to alternative retraining techniques. Weight rewinding (as proposed by Frankle et al., (2019)), rewinds unpruned weights to their values from earlier in training and retrains them from there using the original training schedule. Learning rate rewinding (which we propose) trains the unpruned weights from their final values using the same learning rate schedule as weight rewinding. Both rewinding techniques outperform fine-tuning, forming the basis of a network-agnostic pruning algorithm that matches the accuracy and compression ratios of several more network-specific state-of-the-art techniques.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Alex Renda (11 papers)
  2. Jonathan Frankle (37 papers)
  3. Michael Carbin (45 papers)
Citations (365)

Summary

  • The paper demonstrates that weight and learning rate rewinding outperform fine-tuning in maintaining high accuracy post-pruning.
  • It evaluates these techniques on both structured and unstructured pruning using networks like ResNet-56, ResNet-50, and GNMT with CIFAR-10, ImageNet, and WMT16 EN-DE datasets.
  • The study introduces a simplified pruning algorithm based on learning rate rewinding that minimizes hyperparameter tuning while achieving state-of-the-art efficiency.

Comparative Analysis of Neural Network Retraining Techniques for Pruning

The paper presented provides an in-depth analysis of three neural network retraining techniques following pruning: fine-tuning, weight rewinding, and a newly proposed method, learning rate rewinding. The research aim is to evaluate these techniques on their ability to maintain or enhance the accuracy of pruned networks, while improving efficiency and minimizing search cost. The paper is rooted in the context of neural network pruning, a prominent procedure designed to reduce model complexity by eliminating unnecessary network components, thereby ensuring computational benefits without compromising performance.

Methodology and Evaluation Criteria

Neural network pruning can be broadly categorized into unstructured and structured methods. The former deletes individual weights across a network, while the latter removes entire elements such as filters or channels, often having immediate performance benefits on standard hardware. The researchers evaluate the three retraining techniques across these pruning categories using several networks: ResNet-56, ResNet-50, and GNMT on respective datasets CIFAR-10, ImageNet, and WMT16 EN-DE. The focus metrics for evaluation are:

  • Accuracy: The ability of the pruned network to perform on unseen data as measured by test accuracy.
  • Efficiency: Primarily focusing on parameter count, efficiency is also extended to include floating point operations (FLOPs) as assessed in Appendix~\ref{app:flops}.
  • Search Cost: The computational cost to find and retrain the pruned network, approximated by retraining epochs needed.

Key Findings

Weight Rewinding and Learning Rate Rewinding vs. Fine-Tuning: The analysis reveals that both weight rewinding and learning rate rewinding consistently outperform fine-tuning in maintaining high accuracy across varying compression ratios. Weight rewinding, in particular, validates the lottery ticket hypothesis by showing that high-performing subnetworks are discoverable early in the training cycle. However, learning rate rewinding further enhances this capability by only rewinding the learning rate rather than the weights, achieving or surpassing the efficacy of weight rewinding.

Iterative Pruning Superiority: Through iterative pruning, the paper demonstrates the superiority of learning rate rewinding and, to a significant extent, weight rewinding. This combination yields better Accuracy-Parameter-Efficiency tradeoffs compared to more conventional, complex methods that require network-specific hyperparameters or reinforcement learning strategies. In particular, learning rate rewinding showcases the state-of-the-art ability to achieve high compression ratios without dropping accuracy below original levels.

Simplified Pruning Algorithm: Leveraging these findings, a simplified pruning algorithm based on learning rate rewinding is proposed. This method circumvents the need for complex hyperparameter tuning typical in other state-of-the-art solutions and effectively serves as a default choice for practical implementations.

Theoretical and Practical Implications

The paper provides significant insights into neural network retraining post-pruning, indicating that the choice of retraining technique can profoundly impact the balance between accuracy retention and computational efficiency. By demonstrating that retraining with appropriate schedule adjustments can lead to models as capable as those achieved through fine-tuning, this work lays the groundwork for more practical, less resource-intensive pruning methodologies.

Potential future work could explore further decoupling of retraining parameters and training schedules to push the boundaries of pruning effectiveness while minimizing the resource burden. An added focus on hyperparameter-independent models can enhance deployability across diverse applications without the substantial tuning current methods necessitate.

In conclusion, by critically evaluating and contrasting existing and novel retraining approaches, the paper contributes to advancing the theoretical framework and practical toolkit available for neural network compression through pruning. Such advancements significantly facilitate deploying more efficient models without forfeiting performance, crucial in both research settings and real-world applications.

Youtube Logo Streamline Icon: https://streamlinehq.com