Drawing Multiple Augmentation Samples Per Image During Training Efficiently Decreases Test Error (2105.13343v2)

Published 27 May 2021 in cs.LG and cs.CV

Abstract: In computer vision, it is standard practice to draw a single sample from the data augmentation procedure for each unique image in the mini-batch. However recent work has suggested drawing multiple samples can achieve higher test accuracies. In this work, we provide a detailed empirical evaluation of how the number of augmentation samples per unique image influences model performance on held out data when training deep ResNets. We demonstrate drawing multiple samples per image consistently enhances the test accuracy achieved for both small and large batch training. Crucially, this benefit arises even if different numbers of augmentations per image perform the same number of parameter updates and gradient evaluations (requiring the same total compute). Although prior work has found variance in the gradient estimate arising from subsampling the dataset has an implicit regularization benefit, our experiments suggest variance which arises from the data augmentation process harms generalization. We apply these insights to the highly performant NFNet-F5, achieving 86.8$\%$ top-1 w/o extra data on ImageNet.

Citations (31)

View on Semantic Scholar

Summary

The paper shows that using multiple augmentation samples per image significantly improves test accuracy on CIFAR-100 and ImageNet by reducing detrimental gradient variance.
The study reveals that higher augmentation multiplicity stabilizes training at elevated learning rates, enhancing validation performance despite slower training convergence.
The paper applies the strategy to high-performance models like NFNet-F5, achieving 86.8% top-1 accuracy on ImageNet without extra data, highlighting its practical impact.

Analysis of the Impact of Augmentation Multiplicity on Model Generalization

This paper presents a rigorous empirical analysis of the impact of augmentation multiplicity on the generalization performance of deep networks in computer vision tasks. The paper systematically evaluates how varying the number of augmentation samples per unique image can influence model performance, specifically focusing on the training and evaluation of deep residual networks (ResNets).

The authors address a key aspect of data augmentation that has been previously overlooked — its dual influence on gradient estimates arising from both bias and variance. They argue, through extensive empirical data, that while data augmentation is beneficial due to its biasing effect on gradients, the variance it introduces is counterproductive to generalization. This finding contradicts previous assumptions that both bias and variance contribute positively to the benefits of common regularization techniques like Dropout.

Key Findings

Through a comprehensive set of experiments on CIFAR-100 and ImageNet datasets using ResNet architectures, the authors report several key findings:

Augmentation Multiplicity and Test Accuracy:
- Drawing multiple augmentation samples per image significantly enhances test accuracy for both small and large batch sizes, irrespective of the number of parameter updates and gradient evaluations.
- The benefit holds true even when batch sizes are held constant and fewer unique images populate each batch, debunking the idea that simply increasing diversity within a batch is the primary driver of performance improvement.
Influence on Training Dynamics:
- Large augmentation multiplicities lead to superior test performance by stabilizing training at higher learning rates, reducing the detrimental impact of variance from the data augmentation process itself.
- Although large augmentation multiplicities imply slower convergence on the training set, they exhibit superior convergence properties when evaluated against validation data.
Implications for High-Performance Architectures:
- Applying augmented multiplicity to state-of-the-art NFNet models, specifically NFNet-F5, leads to substantial gains in accuracy on ImageNet datasets, achieving 86.8% top-1 accuracy without additional data. This highlights the broader applicability of this training strategy to more complex and highly regularized models.

Theoretical and Practical Implications

The findings in this paper imply significant practical and theoretical directions for future research and application:

Algorithm Design: The results reinforce the potential of augmentation multiplicity as a default approach in designing training regimes for deep learning models in vision tasks. It suggests reevaluating the entrenched norms of single-sample data augmentation practices.
Model Training Efficiency: From a computational perspective, the ability to enhance generalization while fixing or even decreasing the computational cost of training (in terms of parameter updates) is particularly valuable. The proposed method warrants further exploration in the context of resource-constrained learning environments.
Regularization Analysis: The distinction between the beneficial and detrimental sources of variance within data regularization schemes opens avenues for targeted manipulation of these processes. The paper's insights challenge existing beliefs around stochastic optimization and its interactions with different sources of variance, urging a reconsideration of underlying assumptions.

Future Prospects in AI Research

The insights from these experiments advocate for rethinking model training and optimization strategies, especially under resource limitations. Future AI developments could focus on refining data augmentation procedures and multi-sample tactics tailored not just for image classification but across other domains of deep learning including natural language processing and reinforcement learning, which similarly grapple with generalization challenges.

By providing a clear, empirically backed narrative around the role of augmentation multiplicity, this work sets a benchmark for future research initiatives aimed at disentangling the multifaceted interactions between data augmentation and gradient-based learning methods.

PDF Markdown

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Generate Now

Related Papers

Authors (5)

Tweets

https://twitter.com/cloneofsimo/status/1782246468327452903

YouTube

Show All Videos