Fixing the train-test resolution discrepancy: FixEfficientNet (2003.08237v5)

Published 18 Mar 2020 in cs.CV and cs.LG

Abstract: This paper provides an extensive analysis of the performance of the EfficientNet image classifiers with several recent training procedures, in particular one that corrects the discrepancy between train and test images. The resulting network, called FixEfficientNet, significantly outperforms the initial architecture with the same number of parameters. For instance, our FixEfficientNet-B0 trained without additional training data achieves 79.3% top-1 accuracy on ImageNet with 5.3M parameters. This is a +0.5% absolute improvement over the Noisy student EfficientNet-B0 trained with 300M unlabeled images. An EfficientNet-L2 pre-trained with weak supervision on 300M unlabeled images and further optimized with FixRes achieves 88.5% top-1 accuracy (top-5: 98.7%), which establishes the new state of the art for ImageNet with a single crop. These improvements are thoroughly evaluated with cleaner protocols than the one usually employed for Imagenet, and particular we show that our improvement remains in the experimental setting of ImageNet-v2, that is less prone to overfitting, and with ImageNet Real Labels. In both cases we also establish the new state of the art.

Authors (4)

Hugo Touvron (22 papers)
Andrea Vedaldi (195 papers)
Matthijs Douze (52 papers)
Hervé Jégou (71 papers)

Citations (110)

View on Semantic Scholar

Summary

Fixing the Train-Test Resolution Discrepancy: FixEfficientNet

The paper "Fixing the Train-Test Resolution Discrepancy: FixEfficientNet" authored by Hugo Touvron, Andrea Vedaldi, Matthijs Douze, and Hervé Jegou from Facebook AI Research addresses a significant issue within the field of image classification using Convolutional Neural Networks (CNNs)—the discrepancy that often occurs between training and testing data distributions due to different preprocessing protocols. Traditionally, images are processed differently during training and testing phases, which can lead to skewed data distribution fed to the model and consequently impact its performance negatively.

Frame of Reference

EfficientNet models have established themselves as potent CNN architectures in the landscape of image classification due to their balance in terms of parameter quantity and accuracy. Nonetheless, Touvron et al. argue that there's an area of improvement that can be addressed—optimizing the resolution discrepancy between training and testing phases for further enhancement of model accuracy.

Introduction of FixEfficientNet

The authors propose FixEfficientNet, an advancement upon the EfficientNet architecture which incorporates a method known as FixRes. FixRes offers a combined optimization of resolution and scale during both training and testing phases. This approach involves maintaining consistent Regions of Classification (RoC) sampling, thus mitigating the discrepancies in pre-processing steps commonly found in the pipeline of image classification tasks.

Insights and Numerical Results

The integration of FixRes into the EfficientNet model yielded significant improvements in its performance. Notably, FixEfficientNet-B0 achieved a groundbreaking performance of 79.3% top-1 accuracy on the ImageNet dataset with 5.3 million parameters, surpassing the Noisy Student EfficientNet-B0 which utilized an extensive unlabeled dataset for training. On the higher end, the EfficientNet-L2, pre-trained with weak supervision and further optimized using FixRes, manifests 88.5% top-1 accuracy and 98.7% top-5 accuracy, establishing new benchmarks in the domain with a single crop evaluation.

These results are substantiated with cleaner protocols that simplify comparison and avoid the pitfalls of overfitting, as evidenced by performance metrics on ImageNet Real Labels and ImageNet-v2 datasets. Such empirical findings underline the robustness of FixEfficientNet in establishing state-of-the-art results across varying experimental conditions.

Practical & Theoretical Implications

The implications of adopting FixEfficientNet in image classification are multifold. Computationally, the process of fine-tuning is efficient, requiring only modifications to the classifier or upper network layers, thereby reducing overhead. This approach provides flexibility as it can be appended to any CNN architecture and aligns with other methodologies such as label smoothing. Theoretically, it exemplifies the importance of data distribution consistency in training pipelines, setting precedent for future ventures in harmonizing training and testing phases more effectively to elicit better generalization.

Future Perspectives in AI

FixEfficientNet exemplifies a thought-provoking development in optimizing CNN architectures, hinting at the broader trajectory of machine learning models. The research posits a future where resolution discrepancies are finely tuned to enhance generalized performance, conducive to other domains such as object detection or scene parsing. Furthermore, as AI continues evolving, methodologies like FixRes could integrate seamlessly across models requiring neural architecture search or similar adaptive frameworks to mitigate overfitting and ensure robust, reliable performance even when task requirements or datasets change dynamically.

In summary, "Fixing the Train-Test Resolution Discrepancy: FixEfficientNet" delineates a strategic maneuver within the clustering of CNN improvements, highlighting the efficacy in synchronizing preprocessing protocols during distinct phases of model deployment, thereby bolstering precision and generalization capabilities across the board.

PDF Markdown

Related Papers

Find Related Papers

YouTube

Show All Videos