High-Performance Large-Scale Image Recognition Without Normalization (2102.06171v1)

Published 11 Feb 2021 in cs.CV, cs.LG, and stat.ML

Abstract: Batch normalization is a key component of most image classification models, but it has many undesirable properties stemming from its dependence on the batch size and interactions between examples. Although recent work has succeeded in training deep ResNets without normalization layers, these models do not match the test accuracies of the best batch-normalized networks, and are often unstable for large learning rates or strong data augmentations. In this work, we develop an adaptive gradient clipping technique which overcomes these instabilities, and design a significantly improved class of Normalizer-Free ResNets. Our smaller models match the test accuracy of an EfficientNet-B7 on ImageNet while being up to 8.7x faster to train, and our largest models attain a new state-of-the-art top-1 accuracy of 86.5%. In addition, Normalizer-Free models attain significantly better performance than their batch-normalized counterparts when finetuning on ImageNet after large-scale pre-training on a dataset of 300 million labeled images, with our best models obtaining an accuracy of 89.2%. Our code is available at https://github.com/deepmind/ deepmind-research/tree/master/nfnets

Citations (482)

View on Semantic Scholar

Summary

The paper introduces Adaptive Gradient Clipping (AGC) to stabilize training without normalization layers.
The paper demonstrates that NFNets achieve a top-1 accuracy up to 86.5% on ImageNet while significantly reducing training time.
The paper validates NFNets’ superior transfer learning performance, highlighting their potential for diverse real-world applications.

High-Performance Large-Scale Image Recognition Without Normalization

This paper explores the development of Normalizer-Free Networks (NFNets) as a high-performance alternative to batch-normalized models for large-scale image recognition tasks, particularly on ImageNet. The authors address the inherent limitations and computational overheads associated with Batch Normalization (BN) by proposing a novel architecture capable of matching or exceeding BN-based models in both accuracy and training efficiency.

Key Contributions

Adaptive Gradient Clipping (AGC): The authors introduce AGC, a method to stabilize training by clipping gradients based on the ratio of gradient norms to parameter norms. This allows NFNets to be trained with larger batch sizes and strong data augmentations while maintaining stability. AGC effectively extends the training capabilities of NFNets comparable to batch-normalized counterparts.
NFNets Architecture Design: The NFNet model family is designed to achieve state-of-the-art performance by leveraging a modified SE-ResNeXt-D architecture. With carefully tuned architectural parameters such as depth, width, and additional grouped convolutions, these models strike a balance between performance and computational efficiency.
Performance and Efficiency: NFNets demonstrated a top-1 accuracy of 86.5% on ImageNet, surpassing EfficientNet-B8 equipped with MaxUp, which achieved 85.8%. Notably, an NFNet-F1 model is 8.7 times faster to train than an EfficientNet-B7, while achieving a comparable accuracy of 84.7%.
Transfer Learning Benefits: Beyond ImageNet, NFNets achieve higher accuracy in transfer learning scenarios. Pre-trained on a large dataset with 300 million images, NFNets surpassed batch-normalized networks, highlighting their potential for various tasks across domains without requiring normalization layers.

Implications and Future Directions

The findings open avenues for optimization of neural networks by eliminating the need for normalization layers, addressing batch size limitations, and reducing the training-efficiency gap on modern hardware. This paper asserts the feasibility of building deep networks without normalization that are both stable and efficient in large-scale settings, boosting their relevance for real-world applications.

Future research might focus on further optimization strategies, such as incorporating attention mechanisms, or exploring NFNets' applicability across diverse tasks beyond image classification. Additionally, investigating the integration of AGC with other architectures could provide insights that generalize these findings across broader domains.

In conclusion, this paper establishes NFNets as a viable normalization-free alternative, setting new benchmarks in both speed and accuracy for large-scale image recognition tasks.

PDF Markdown

Related Papers

GitHub

Google DeepMind (301 Moved Permanently) · GitHub

Tweets

https://twitter.com/konstmish/status/1809293502364123594

https://twitter.com/ThibautBoissin/status/1869312782962962432

YouTube

Show All Videos