What makes ImageNet good for transfer learning? (1608.08614v2)

Published 30 Aug 2016 in cs.CV, cs.AI, and cs.LG

Abstract: The tremendous success of ImageNet-trained deep features on a wide range of transfer tasks begs the question: what are the properties of the ImageNet dataset that are critical for learning good, general-purpose features? This work provides an empirical investigation of various facets of this question: Is more pre-training data always better? How does feature quality depend on the number of training examples per class? Does adding more object classes improve performance? For the same data budget, how should the data be split into classes? Is fine-grained recognition necessary for learning good features? Given the same number of training classes, is it better to have coarse classes or fine-grained classes? Which is better: more classes or more examples per class? To answer these and related questions, we pre-trained CNN features on various subsets of the ImageNet dataset and evaluated transfer performance on PASCAL detection, PASCAL action classification, and SUN scene classification tasks. Our overall findings suggest that most changes in the choice of pre-training data long thought to be critical do not significantly affect transfer performance.? Given the same number of training classes, is it better to have coarse classes or fine-grained classes? Which is better: more classes or more examples per class?

Citations (656)

View on Semantic Scholar

Summary

The paper reveals that reducing pre-training images per class results in only a minor performance drop (e.g., 1.5 mAP on PASCAL-DET).
The paper demonstrates that training with fewer classes (127 vs. 1000) leads to a slight loss in transfer effectiveness, suggesting optimal feature learning with fewer distinctions.
The study shows that coarse category training is sufficient to develop robust discriminative features for fine-grained tasks.

Insights into ImageNet's Efficacy for Transfer Learning

The research paper "What Makes ImageNet Good for Transfer Learning?" provides a rigorous empirical analysis questioning widespread assumptions about the essential factors contributing to ImageNet's efficacy in transfer learning scenarios. The authors investigate the contribution of various dataset attributes, such as the number of examples, number of classes, fine versus coarse class granularity, and overall dataset taxonomy, to the success of ImageNet-trained CNN features.

Key Findings and Methodologies

Data Sufficiency: The paper reveals that a significant reduction in pre-training images per class does not substantially degrade transfer learning performance. For instance, halving the number of images per class from 1000 to 500 results in only a minor decrease in performance (1.5 mAP drop on PASCAL-DET), which is less than the decline observed in direct ImageNet classification performance.
Class Sufficiency: Similarly, reducing the number of pre-training classes from 1000 to 127 yields only a slight performance drop on transfer tasks (2.8 mAP on PASCAL-DET). Intriguingly, certain tasks may perform better with fewer classes. This indicates that excessive class distinction might not be crucial for transferability of features.
Fine-grained versus Coarse Recognition: The research demonstrates that fine-grained discrimination is not essential for learning good features. Features trained on more coarse classes adeptly handle fine-grained classes during transfer, as evidenced by the retained performance of features when tasked with fine class distinctions in unseen data.
Coarse Class Induction: The paper examined whether pre-training on coarse categories instills the capability to discern fine classes that were unseen during training. The results affirm that features induced by coarse classes still provide robust discriminative capability at finer granularity.
Number of Classes versus Examples per Class: Given a fixed data budget, a slight advantage exists for more examples per class over more classes with fewer examples. This suggests a richer representation might be learned from having deeper examples within fewer concepts.
Unrelated Class Augmentation: Counterintuitively, the inclusion of unrelated classes does not always enhance performance on transfer tasks, indicating that indiscriminate augmentation could divert the model from optimal feature learning pathways.

Practical Implications and Future Directions

This paper's findings offer significant implications for the design of CNN training regimes, particularly in contexts where a full-scale ImageNet dataset is unavailable or impractical. Researchers and practitioners can leverage fewer classes or images without substantial detriment to transfer performance, reducing computational costs and potentially expediting the model deployment process.

Future investigations might explore whether these findings hold across deeper architectures like ResNet or VGG. Furthermore, examining alternative datasets with divergent target tasks could provide additional insights into the generalizability and scope of transfer learning methodologies.

In conclusion, while prevailing assumptions have aligned data size and class number with effective transfer learning, this paper presents a nuanced understanding that augments the paradigmatic knowledge surrounding CNN model pre-training. It emphasizes the potential redundancy in voluminous and finely-grained data, advocating for a more deliberate and strategically curated approach to dataset construction in transfer learning.

PDF Markdown

Related Papers

YouTube

Show All Videos