94% on CIFAR-10 in 3.29 Seconds on a Single GPU

Published 30 Mar 2024 in cs.LG and cs.CV | (2404.00498v2)

Abstract: CIFAR-10 is among the most widely used datasets in machine learning, facilitating thousands of research projects per year. To accelerate research and reduce the cost of experiments, we introduce training methods for CIFAR-10 which reach 94% accuracy in 3.29 seconds, 95% in 10.4 seconds, and 96% in 46.3 seconds, when run on a single NVIDIA A100 GPU. As one factor contributing to these training speeds, we propose a derandomized variant of horizontal flipping augmentation, which we show improves over the standard method in every case where flipping is beneficial over no flipping at all. Our code is released at https://github.com/KellerJordan/cifar10-airbench.

Abstract PDF HTML Upgrade to Chat

Citations (3)

View on Semantic Scholar

Summary

The paper demonstrates a novel training method that attains 94% accuracy on CIFAR-10 in just 3.29 seconds, setting a new state-of-the-art with a 1.9x speed improvement.
It leverages an optimized CNN architecture and a deterministic alternating flip augmentation to enhance data variability and GPU utilization.
The efficient strategy significantly reduces computational costs and paves the way for rapid deep learning experimentation and hyperparameter tuning.

Achieving 94% Accuracy on CIFAR-10 in Unprecedented Time with Single-GPU Training

Introduction

The CIFAR-10 dataset, a cornerstone in machine learning research, facilitates extensive exploration given its approachable size yet challenging variability. The quest to accelerate training on CIFAR-10 continues to push the boundaries of research, seeking methods that not only fast-track model training but also conserve computational resources. In this regard, we present a training approach that achieves 94% accuracy in 3.29 seconds, 95% in 10.4 seconds, and 96% in 46.3 seconds on a single NVIDIA A100 GPU. These milestones represent a 1.9x speed improvement over the previous state-of-the-art, underscoring significant efficiency gains in training deep learning models.

Methods Overview

Our approach encompasses optimizations across network architecture, data augmentation, and training strategy. Central to our efficiency gains is a derandomized variant of horizontal flipping augmentation, which systematically enhances data variability across training epochs.

Network Architecture and Training

The constructed convolutional neural network (CNN), inspired by the successful elements of prior works, includes key alterations aiming for efficiency—such as a reduced output channel count and the addition of learnable biases to the first convolution, enabling better GPU utilization.

Novel Data Augmentation

We propose a deterministic alternative to the commonly used random horizontal flipping, termed "alternating flip." This method ensures each training image is flipped in an alternating pattern across epochs, minimizing redundancy and enhancing learning efficacy.

Training Strategy

Leveraging previously introduced techniques like Nesterov SGD, lookahead optimization, and targeted learning rate scheduling, we refine the training regime to expedite convergence. Additionally, the deployment of multi-crop inference techniques during evaluation substantially contributes to reaching higher accuracies in reduced time frames.

Results and Discussion

Our methodology not only sets new records for CIFAR-10 training efficiency but also showcases the importance of strategic data manipulation and advanced optimization techniques in accelerating neural network training. The implications of this work extend beyond CIFAR-10, offering insights and tools potent for broader application in machine learning tasks requiring efficient model training iterations.

Practical Implications

The capability to train models rapidly on CIFAR-10 opens the door to conducting extensive experimental iterations, hyperparameter tuning, and model architecture explorations with significantly lower computational costs. For projects that involve massive datasets or multiple training iterations, such as studies on data attribution or training variance, our approach presents a viable solution to manage resource demands effectively.

Theoretical Contributions

Our findings highlight the untapped potential of systematic data augmentation methods and refined training strategies, challenging the community to rethink conventional practices in model training. The introduction of the alternating flip technique, in particular, underlines the value of deterministic approaches in enhancing learning efficiency.

Future Directions

While our research delivers notable advancements in training speed and efficiency on CIFAR-10, the exploration of these methodologies in other domains remains an enticing future avenue. The generality and adaptability of our proposed techniques, particularly the alternating flip augmentation, call for further investigation into their application across different datasets and problem spaces.

In sum, this work not only advances CIFAR-10 training metrics but also contributes valuable techniques and insights to the broader machine learning community, driving forward the capabilities for efficient training of deep neural networks.

Markdown