Do Better ImageNet Models Transfer Better? (1805.08974v3)

Published 23 May 2018 in cs.CV, cs.LG, and stat.ML

Abstract: Transfer learning is a cornerstone of computer vision, yet little work has been done to evaluate the relationship between architecture and transfer. An implicit hypothesis in modern computer vision research is that models that perform better on ImageNet necessarily perform better on other vision tasks. However, this hypothesis has never been systematically tested. Here, we compare the performance of 16 classification networks on 12 image classification datasets. We find that, when networks are used as fixed feature extractors or fine-tuned, there is a strong correlation between ImageNet accuracy and transfer accuracy ($r = 0.99$ and $0.96$, respectively). In the former setting, we find that this relationship is very sensitive to the way in which networks are trained on ImageNet; many common forms of regularization slightly improve ImageNet accuracy but yield penultimate layer features that are much worse for transfer learning. Additionally, we find that, on two small fine-grained image classification datasets, pretraining on ImageNet provides minimal benefits, indicating the learned features from ImageNet do not transfer well to fine-grained tasks. Together, our results show that ImageNet architectures generalize well across datasets, but ImageNet features are less general than previously suggested.

Authors (3)

Simon Kornblith (53 papers)
Jonathon Shlens (58 papers)
Quoc V. Le (128 papers)

Citations (1,236)

View on Semantic Scholar

Summary

The paper establishes that ImageNet accuracy strongly correlates with transfer performance, with r = 0.99 for fixed feature extraction and r = 0.96 for fine-tuning.
The study finds that fixed feature extractors are highly sensitive to regularization, where techniques like label smoothing and dropout can reduce transferability.
For fine-grained tasks, the benefits of ImageNet pretraining are limited, indicating a trade-off between optimizing for ImageNet performance and effective transfer learning on specialized datasets.

Do Better ImageNet Models Transfer Better?

"Do Better ImageNet Models Transfer Better?" addresses a critical gap in computer vision research: the assumed but unverified belief that high-performing ImageNet models inherently excel in transferring to diverse vision tasks. This paper encompasses a comprehensive evaluation, contrasting 16 classification networks across 12 image classification datasets.

Key Findings

The paper's primary revelations are articulated through meticulous analysis:

Strong Correlation in Transfer Performance:
- When models are employed as fixed feature extractors, there exists a robust correlation between ImageNet accuracy and transfer accuracy ( $r = 0.99$ ).
- Similarly, in the fine-tuning setting, the correlation remains strong ( $r = 0.96$ ).
Sensitivity to Regularization:
- Fixed feature extractors are highly sensitive to regularization techniques used during ImageNet training. Certain regularizers, although beneficial for ImageNet accuracy, detrimentally impact the transferability of penultimate layer features.
- Specifically, removing regularization settings, such as label smoothing and dropout, improved transfer performance, despite a marginal drop in ImageNet accuracy.
Limited Benefits for Fine-Grained Tasks:
- On smaller, fine-grained classification datasets, pretraining on ImageNet showed minimal benefits, indicating limited transferability of learned features for these specific tasks.

Practical and Theoretical Implications

Model Generalization:
- The paper underscores that high-performing ImageNet architectures generally transfer well across different datasets. Hence, investing in the development of more accurate ImageNet models can proportionally enhance performance on other vision tasks.
Feature Sensitivity:
- The observation that regularization harms the transfer performance of fixed features suggests a trade-off between achieving high ImageNet accuracy and obtaining generalizable feature representations. This insight could refine training protocols, prioritizing different regularization strategies depending on the end-use case.
Fine-Grained Recognition:
- For fine-grained classification tasks, the paper elucidates that the architectural benefits of ImageNet models persist even when pretraining benefits are minimal. These findings suggest that while weight transfer may not always be advantageous, choosing robust architectures remains critical.

Speculation on Future Developments

Moving forward, the challenge lies in devising models that offer both high ImageNet accuracy and superior transferability across diverse tasks without compromising on either. Future research could explore:

Regularization Techniques: Developing new regularization techniques or adapting existing ones to optimize both ImageNet performance and feature robustness for transfer learning.
Few-Shot Learning Approaches: Integrating few-shot learning methodologies to enhance the adaptability of models trained on large datasets like ImageNet to smaller, more specialized task domains.
Meta-Learning: Advanced meta-learning algorithms could play a pivotal role in creating models adept at rapidly fine-tuning to new, unseen tasks with limited data.

Conclusion

The paper "Do Better ImageNet Models Transfer Better?" offers compelling evidence that while better ImageNet models generally transfer better, this relationship is complex and influenced by the training regimen. The findings prompt a reevaluation of how models are trained and transferred, emphasizing a balanced approach to regularization and architecture selection to optimize for both task-specific fidelity and generalization capabilities.

PDF Markdown