FreezeOut: Accelerate Training by Progressively Freezing Layers (1706.04983v2)

Published 15 Jun 2017 in stat.ML and cs.LG

Abstract: The early layers of a deep neural net have the fewest parameters, but take up the most computation. In this extended abstract, we propose to only train the hidden layers for a set portion of the training run, freezing them out one-by-one and excluding them from the backward pass. Through experiments on CIFAR, we empirically demonstrate that FreezeOut yields savings of up to 20% wall-clock time during training with 3% loss in accuracy for DenseNets, a 20% speedup without loss of accuracy for ResNets, and no improvement for VGG networks. Our code is publicly available at https://github.com/ajbrock/FreezeOut

Authors (4)

Andrew Brock (21 papers)
Theodore Lim (9 papers)
J. M. Ritchie (4 papers)
Nick Weston (5 papers)

Citations (112)

View on Semantic Scholar

Summary

The paper demonstrates that progressive freezing of layers reduces training time with only a small drop in accuracy, achieving speedups of up to 20% on DenseNets.
It employs cosine annealing schedules, including linear and cubic configurations, to gradually lower learning rates and freeze early layers during training.
Experimental results on DenseNet, Wide ResNet, and VGG highlight an architecture-dependent efficacy, supporting rapid prototyping in resource-constrained environments.

FreezeOut: Accelerate Training by Progressively Freezing Layers

The paper "FreezeOut: Accelerate Training by Progressively Freezing Layers" presents a novel approach to reduce the computational cost of training deep neural networks without significant performance degradation. The proposed technique, FreezeOut, is particularly aligned with reducing training time by freezing out layers progressively throughout the training schedule. This work asserts that early layers in deep neural architectures can reach adequate configurations faster and hence do not necessitate intensive fine-tuning as the deeper layers do.

Methodological Overview

FreezeOut incorporates a strategic process of layer freezing, utilizing cosine annealing schedules, originally proposed in SGDR. The core idea is to lower the learning rates of initial layers to zero gradually during training, thus putting these layers into inference mode and exempting them from the backward pass. This method is significantly different from DropOut or Stochastic Depth, in that FreezeOut reduces computational resources without employing residual connections or layer dropping in each training iteration.

Two noteworthy configuration strategies of FreezeOut are presented: linear scheduling and cubic scheduling. In these, learning rates per layer are adjusted and annealed over time based on pre-defined schedules. Notably, for cubic scheduling, layers are prioritized differently compared to linear scheduling, thereby offering a nuanced approach to layer freezing.

Experimental Evaluation

The paper provides a detailed empirical evaluation of FreezeOut on diverse architectures—DenseNets, Wide ResNets, and VGG—across established datasets like CIFAR-10 and CIFAR-100. Results indicate that FreezeOut achieves considerable wall-clock time reductions. Specifically:

In DenseNets, a remarkable speedup of up to 20% is achieved at the cost of a mere 3% loss in test accuracy.
For Wide ResNets, FreezeOut not only accelerates training but also, interestingly, sometimes improves accuracy when compared for the same number of epochs.
VGG architectures did not gain significant benefits from FreezeOut, pointing to the prerequisite of skip connections for the effectiveness of this approach.

These findings are supported by thorough computational cost models that predict speedups—confirmed by observed experimental results—thus validating the efficacy and applicability of FreezeOut in practical scenarios.

Implications and Future Directions

The capability of FreezeOut to achieve reduced training times without a dramatic impact on model accuracy has practical implications in scenarios that require iterative prototyping and hyperparameter tuning. The technique provides a tangible trade-off mechanism for resource-constrained environments or rapid prototyping cycles.

The observations around architecture-specific efficacy (in ResNets versus VGGs) open avenues for future exploration. Further investigation into the interaction between connection types within neural topologies and FreezeOut mechanisms could yield optimized approaches or hybrid strategies that extend the versatility of FreezeOut. Additionally, integrating FreezeOut with other regularization techniques could possibly maximize model generalization capabilities while maintaining computational efficiency.

In conclusion, FreezeOut is poised as a strategic method for accelerating deep learning workflows, providing an optimized balance between computational demand and model performance. The outcomes highlighted in this paper contribute to the evolving discourse on efficient neural network training paradigms, inviting further research and adaptation in complex application domains.

PDF Markdown

Related Papers

GitHub

GitHub - ajbrock/FreezeOut: Accelerate Neural Net Training by Progressively Freezing Layers (212 stars)