Efficient Sharpness-aware Minimization for Improved Training of Neural Networks (2110.03141v2)

Published 7 Oct 2021 in cs.AI, cs.CV, and cs.LG

Abstract: Overparametrized Deep Neural Networks (DNNs) often achieve astounding performances, but may potentially result in severe generalization error. Recently, the relation between the sharpness of the loss landscape and the generalization error has been established by Foret et al. (2020), in which the Sharpness Aware Minimizer (SAM) was proposed to mitigate the degradation of the generalization. Unfortunately, SAM s computational cost is roughly double that of base optimizers, such as Stochastic Gradient Descent (SGD). This paper thus proposes Efficient Sharpness Aware Minimizer (ESAM), which boosts SAM s efficiency at no cost to its generalization performance. ESAM includes two novel and efficient training strategies-StochasticWeight Perturbation and Sharpness-Sensitive Data Selection. In the former, the sharpness measure is approximated by perturbing a stochastically chosen set of weights in each iteration; in the latter, the SAM loss is optimized using only a judiciously selected subset of data that is sensitive to the sharpness. We provide theoretical explanations as to why these strategies perform well. We also show, via extensive experiments on the CIFAR and ImageNet datasets, that ESAM enhances the efficiency over SAM from requiring 100% extra computations to 40% vis-a-vis base optimizers, while test accuracies are preserved or even improved.

Citations (121)

View on Semantic Scholar

Summary

The paper presents ESAM, an innovation that cuts SAM’s computational cost by approximately 60% while maintaining robust generalization.
It employs Stochastic Weight Perturbation to selectively perturb model weights and Sharpness-Sensitive Data Selection to optimize critical training samples.
Empirical results on CIFAR and ImageNet validate ESAM’s efficiency and show comparable or superior accuracy to traditional SAM.

Efficient Sharpness-aware Minimization for Improved Training of Neural Networks

The paper focuses on enhancing neural network training efficiency through improved sharpness-aware minimization (SAM), introducing the Efficient Sharpness Aware Minimizer (ESAM). Originating from the observation that overparametrized deep neural networks (DNNs) often attain high performance levels at the cost of generalization errors, SAM was initially proposed to seek flat minimum solutions to mitigate such errors. However, SAM’s computational overhead significantly increases the required training resources, doubling the cost over standard optimizers like SGD, due to additional propagation steps needed for sharpness calculations.

The proposed ESAM aims to retain SAM's ability to improve DNN generalization while reducing its computational demands. ESAM incorporates two innovative methodologies: Stochastic Weight Perturbation (SWP) and Sharpness-Sensitive Data Selection (SDS). SWP randomly selects a subset of model weights to perturb during training, subsequently reducing the problem's dimensionality and computational cost. Simultaneously, SDS identifies and optimizes sharpness on a subset of training data, focusing on samples sensitive to sharpness changes, thereby maintaining approximation quality while reducing data processing requirements.

Extensive experimentation across CIFAR and ImageNet datasets substantiates ESAM's improvement over SAM in both efficiency and generalization accuracy. For instance, the computational overhead of training with ESAM reduced from the 100% increase typical with SAM to a mere 40%, with accuracy metrics proving comparable or superior. The empirical evidence presented in the paper highlights ESAM’s ability to reach an optimal balance between resource allocation and performance, effectively transforming efficiency in real-world applications of deep learning.

In the broader context of training overparameterized DNNs, ESAM’s innovations signify critical advancements in reducing the practical barriers to the effective deployment of SAM and similar methodologies. Theoretical explanations support ESAM's design, which further contributes to the literature on loss landscape geometry's impact on generalization. The authors propose potential future work aiming to integrate ESAM in scenarios involving mixed computational architectures and larger datasets beyond current benchmarks.

This paper presents a meaningful contribution to improving training practices in deep learning, pointing towards more efficient and resource-conscious approaches without sacrificing the robustness of trained models. Further research may extend ESAM's principles to address additional complexities inherent in emerging neural network models and diverse applications, thereby continuing to push boundaries in neural optimization strategies.

PDF Markdown

Related Papers

YouTube

Show All Videos