EfficientNetV2: Smaller Models and Faster Training (2104.00298v3)

Published 1 Apr 2021 in cs.CV

Abstract: This paper introduces EfficientNetV2, a new family of convolutional networks that have faster training speed and better parameter efficiency than previous models. To develop this family of models, we use a combination of training-aware neural architecture search and scaling, to jointly optimize training speed and parameter efficiency. The models were searched from the search space enriched with new ops such as Fused-MBConv. Our experiments show that EfficientNetV2 models train much faster than state-of-the-art models while being up to 6.8x smaller. Our training can be further sped up by progressively increasing the image size during training, but it often causes a drop in accuracy. To compensate for this accuracy drop, we propose to adaptively adjust regularization (e.g., dropout and data augmentation) as well, such that we can achieve both fast training and good accuracy. With progressive learning, our EfficientNetV2 significantly outperforms previous models on ImageNet and CIFAR/Cars/Flowers datasets. By pretraining on the same ImageNet21k, our EfficientNetV2 achieves 87.3% top-1 accuracy on ImageNet ILSVRC2012, outperforming the recent ViT by 2.0% accuracy while training 5x-11x faster using the same computing resources. Code will be available at https://github.com/google/automl/tree/master/efficientnetv2.

Citations (2,179)

View on Semantic Scholar

Summary

The paper introduces a training-aware NAS framework that enables models to train up to 4x faster and be 6.8x more parameter-efficient than previous architectures.
It implements an adaptive progressive learning strategy that dynamically adjusts regularization as image sizes increase to preserve accuracy.
Empirical results show EfficientNetV2 achieving 87.3% top-1 accuracy on ImageNet, outperforming models like ViT-L/16 while reducing inference latency.

EfficientNetV2: Smaller Models and Faster Training

The paper "EfficientNetV2: Smaller Models and Faster Training" by Mingxing Tan and Quoc V. Le presents a novel family of convolutional neural networks that significantly enhance training speed and parameter efficiency compared to previous models. The research integrates training-aware neural architecture search (NAS) with optimized scaling to jointly enhance model size, number of parameters, and training duration. The paper also introduces an improved progressive learning approach that dynamically adjusts regularization in conjunction with image size increments to mitigate potential accuracy drops.

Key Contributions

Efficiency-Oriented NAS and Scaling: EfficientNetV2 models leverage a training-aware NAS framework to navigate a search space inclusive of operations such as Fused-MBConv. This method results in models that train up to four times faster and are up to 6.8x more parameter-efficient than other state-of-the-art architectures.
Optimized Progressive Learning: Typically, using progressively larger image sizes during training can slow convergence and reduce accuracy. EfficientNetV2 proposes an adaptive progressive learning method that adjusts regularization techniques (e.g., data augmentation, dropout rates) in tandem with increasing image sizes, ensuring both speed and accuracy improvements.
Empirical Validation: The EfficientNetV2 family demonstrates superior performance on ImageNet and CIFAR/Cars/Flowers datasets. Notably, EfficientNetV2—pretrained on ImageNet21k—achieves 87.3% top-1 accuracy on ImageNet ILSVRC2012, outperforming Vision Transformer (ViT) models such as ViT-L/16 by 2% accuracy while training 5x-11x faster.

Experimental Results

Training Efficiency: EfficientNetV2 shows remarkable efficiency, training up to 11x faster than prior models while utilizing up to 6.8x fewer parameters. The model's training efficacy was systematically assessed under different constraints, revealing significant gains over predecessors such as EfficientNet and ResNet.
Inference Speed: When compared against Vision Transformers and ResNet-based models, EfficientNetV2 maintains competitive or superior accuracy with considerably reduced inference latency. For instance, EfficientNetV2-M achieves comparable accuracy to EfficientNet-B7, but it is 3.1x faster in inference.

Model Architecture Insights

EfficientNetV2 embodies several crucial architectural changes from its predecessor (EfficientNet):

Utilization of both MBConv and Fused-MBConv blocks, especially in the early layers, which enhances computational efficiency.
Preference for smaller expansion ratios in MBConv, minimizing memory overhead.
Inclusion of more layers in later stages to scale up capacity without proportional increases in computational cost.

The proposed models eliminate the inefficient stages present in earlier EfficientNet variants, optimizing both speed and parameters effectively.

Implications and Future Directions

EfficientNetV2 reaffirms the value of convolutional networks, especially when paired with optimized training strategies, in maintaining competitiveness even against emerging architectures like Vision Transformers. The research highlights the dynamic nature of performance optimization paradigms and suggests that scaling models with careful attention to both computational and parameter efficiency presents a viable path forward.

For future investigations, exploring other applications where EfficientNetV2 could contribute, such as object detection or segmentation, could illuminate further efficiencies. Additionally, integrating new hardware-optimized operations into the search space or fine-tuning the adaptive regularization methodologies may yield even greater performance improvements.

In conclusion, EfficientNetV2 makes a significant contribution to the development of neural architectures, demonstrating how methodically adjusting both the training process and model architecture leads to substantial gains in both training and inference efficiency. The findings underscore the continuous evolution and required balance between parameter efficiency, training speed, and model accuracy in deep learning advancements.

EfficientNetV2: Smaller Models and Faster Training (2104.00298v3)

Summary

EfficientNetV2: Smaller Models and Faster Training

Key Contributions

Experimental Results

Model Architecture Insights

Implications and Future Directions

GitHub

YouTube

EfficientNetV2: Smaller Models and Faster Training (2104.00298v3)

Summary

EfficientNetV2: Smaller Models and Faster Training

Key Contributions

Experimental Results

Model Architecture Insights

Implications and Future Directions

Related Papers

GitHub

YouTube