EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks (1905.11946v5)

Published 28 May 2019 in cs.LG, cs.CV, and stat.ML

Abstract: Convolutional Neural Networks (ConvNets) are commonly developed at a fixed resource budget, and then scaled up for better accuracy if more resources are available. In this paper, we systematically study model scaling and identify that carefully balancing network depth, width, and resolution can lead to better performance. Based on this observation, we propose a new scaling method that uniformly scales all dimensions of depth/width/resolution using a simple yet highly effective compound coefficient. We demonstrate the effectiveness of this method on scaling up MobileNets and ResNet. To go even further, we use neural architecture search to design a new baseline network and scale it up to obtain a family of models, called EfficientNets, which achieve much better accuracy and efficiency than previous ConvNets. In particular, our EfficientNet-B7 achieves state-of-the-art 84.3% top-1 accuracy on ImageNet, while being 8.4x smaller and 6.1x faster on inference than the best existing ConvNet. Our EfficientNets also transfer well and achieve state-of-the-art accuracy on CIFAR-100 (91.7%), Flowers (98.8%), and 3 other transfer learning datasets, with an order of magnitude fewer parameters. Source code is at https://github.com/tensorflow/tpu/tree/master/models/official/efficientnet.

Authors (2)

Mingxing Tan (46 papers)
Quoc V. Le (128 papers)

Citations (15,924)

View on Semantic Scholar

Summary

EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks

The research paper "EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks" authored by Mingxing Tan and Quoc V. Le from Google Research Brain Team presents a nuanced approach to the model scaling of Convolutional Neural Networks (ConvNets). This work explores the critical dimensions of depth, width, and resolution in ConvNets and introduces a compound scaling methodology to enhance the performance of neural networks.

Key Insights and Contributions

The conventional methodology to scale ConvNets often involves individually increasing the depth, width, or resolution of the network. However, these methods tend to lead to diminishing returns in performance. Through rigorous empirical analysis, this paper discovers that a balanced scaling across all three dimensions is essential for achieving superior accuracy and efficiency.

Compound Scaling Method

The core contribution of the paper is the introduction of a compound scaling method. This method employs a compound coefficient to uniformly scale the network's depth ( $d$ ), width ( $w$ ), and resolution ( $r$ ). The following formula encapsulates this approach:

$d = \alpha^\phi$
$w = \beta^\phi$
$r = \gamma^\phi$
where $\alpha$ , $\beta$ , and $\gamma$ are constants, and $\phi$ is a user-defined coefficient.

The equation $\alpha \cdot \beta^2 \cdot \gamma^2 \approx 2$ ensures that the model's complexity roughly doubles (measured in FLOPS) with each unit increment in $\phi$ .

Experimental Validation

Scaling Experiments

The paper verifies the effectiveness of compound scaling by applying it to baseline models, including MobileNet and ResNet. Results demonstrate a significant improvement in performance over traditional single-dimension scaling methods. For example, scaling MobileNetV1 using the compound method achieved a top-1 accuracy of 75.6% on ImageNet with 2.3 billion FLOPS, outperforming other scaling methods.

EfficientNets Architecture

A new baseline, EfficientNet-B0, was derived using neural architecture search optimized for both accuracy and FLOPS. Subsequent models EfficientNet-B1 to EfficientNet-B7 were generated using the compound scaling method, demonstrating superior performance:

EfficientNet-B7 achieved state-of-the-art top-1 accuracy of 84.3% on ImageNet, being 8.4 times smaller and 6.1 times faster on inference than GPipe, which held the previous record.

Moreover, EfficientNets exhibited remarkable efficiency on transfer learning tasks, achieving state-of-the-art accuracy on five out of eight datasets with much fewer parameters compared to previous models. EfficientNet transfers well, showing the generalization capability across multiple datasets such as CIFAR-100, Flowers, and Stanford Cars, among others.

Theoretical and Practical Implications

Theoretical Implications

The compound scaling methodology provides a systematic way to expand ConvNets that can be generalized across various architectures. It bridges the gap in understanding how individual scaling dimensions interact and mutually enhance performance. By achieving a balanced scaling, this work underscores the importance of a coordinated approach to scaling deep neural networks.

Practical Implications

Practically, the EfficientNets can significantly reduce computational resources without compromising accuracy, making them highly suitable for deployment in resource-constrained environments like mobile and edge devices. This efficient use of resources will likely spur further research into scalable neural architectures, raising the bar for both academic research and industrial applications.

Future Developments

The findings from this paper open avenues for several future directions in AI research. Potential developments include:

Exploring Other Network Architectures: Applying compound scaling to other emerging architectures such as Transformer models could further validate its efficacy.
Hardware-Aware Optimization: Integrating hardware constraints directly into the scaling process could enhance performance on specific devices.
Automated Scaling Policies: Further advancements in neural architecture search to dynamically learn optimal scaling policies for diverse tasks and datasets could streamline the design of efficient models.

Conclusion

The paper "EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks" provides a rigorous and principled approach to ConvNet scaling, demonstrating that a balanced scaling strategy can lead to substantial gains in both accuracy and efficiency. By introducing the compound scaling method and validating it across multiple datasets, this work presents a significant advancement in the field of neural network scaling, with broad implications for future research and practical applications in AI.

PDF Markdown

Related Papers

Find Related Papers

GitHub

YouTube

Show All Videos