Large Scale Fine-Grained Categorization and Domain-Specific Transfer Learning (1806.06193v1)

Published 16 Jun 2018 in cs.CV and cs.LG

Abstract: Transferring the knowledge learned from large scale datasets (e.g., ImageNet) via fine-tuning offers an effective solution for domain-specific fine-grained visual categorization (FGVC) tasks (e.g., recognizing bird species or car make and model). In such scenarios, data annotation often calls for specialized domain knowledge and thus is difficult to scale. In this work, we first tackle a problem in large scale FGVC. Our method won first place in iNaturalist 2017 large scale species classification challenge. Central to the success of our approach is a training scheme that uses higher image resolution and deals with the long-tailed distribution of training data. Next, we study transfer learning via fine-tuning from large scale datasets to small scale, domain-specific FGVC datasets. We propose a measure to estimate domain similarity via Earth Mover's Distance and demonstrate that transfer learning benefits from pre-training on a source domain that is similar to the target domain by this measure. Our proposed transfer learning outperforms ImageNet pre-training and obtains state-of-the-art results on multiple commonly used FGVC datasets.

Authors (5)

Yin Cui (45 papers)
Yang Song (299 papers)
Chen Sun (187 papers)
Andrew Howard (59 papers)
Serge Belongie (125 papers)

Citations (454)

View on Semantic Scholar

Summary

Large Scale Fine-Grained Categorization and Domain-Specific Transfer Learning

The paper "Large Scale Fine-Grained Categorization and Domain-Specific Transfer Learning," authored by Cui et al., investigates methods to enhance fine-grained visual categorization (FGVC) through novel transfer learning strategies. The research discerns crucial factors that drive performance in large-scale categorization tasks and explores domain-specific transfer learning techniques to optimize results in smaller, more specialized datasets.

Methodology and Key Contributions

The authors first address the challenge of FGVC, which involves recognizing subtle differences within categories such as bird species or car models. A central theme of this work is leveraging large datasets, like ImageNet, to transfer learned knowledge to domain-specific tasks. This approach is particularly useful given the difficulty of acquiring domain-specific annotated data, which often requires expert knowledge.

Large Scale FGVC

The paper reports a significant improvement in their FGVC method, which achieved first place in the iNaturalist 2017 species classification challenge. The authors emphasize two main innovations:

Higher Resolution Images: Contrary to traditional datasets like ImageNet, iNaturalist's images come with diverse resolutions and scales, promoting the use of higher resolution inputs in neural networks. This approach improved performance, highlighting the importance of capturing finer details in FGVC tasks.
Long-Tailed Distribution Management: The research outlines a two-stage training process to handle the common issue of class imbalance in natural datasets. By initially training on the entire dataset before fine-tuning on a balanced subset, the approach improved the performance on underrepresented categories significantly, as indicated by numerical results in decreased error rates.

Transfer Learning and Domain Similarity

A novel contribution of this paper is the introduction of a metric to evaluate domain similarity using the Earth Mover's Distance (EMD). By quantifying the visual similarity between source and target datasets, the authors demonstrate that selecting a source domain similar to the target enhances the efficiency of transfer learning.

Domain Similarity Metric: The proposed metric calculates the visual similarity based on feature representations. The research shows that FGVC performance benefits when networks are pre-trained on a dataset that is visually aligned with the target domain.
Empirical Results: The proposed transfer learning strategy outperformed traditional ImageNet-based pre-training on several FGVC datasets, achieving state-of-the-art results. This underscores the potential of using domain-specific pre-training strategies, thereby providing a practical alternative to augmenting datasets with additional data collection.

Implications and Future Directions

This research offers a comprehensive approach to tackling FGVC by addressing dataset scale and domain specificity. The insights into domain similarity measure provide a strategic advantage in selecting appropriate pre-training datasets, allowing for improved accuracy in specialized tasks without resorting to extensive manual data labeling.

Future work could investigate further refinement in similarity measures or incorporate additional domain factors beyond visual characteristics. Additionally, extending this methodology to other domains, such as text or audio-based classification, could uncover broader applications of domain-specific transfer learning strategies.

In conclusion, Cui et al.'s paper advances the understanding of FGVC and transfer learning, providing new methodological tools and empirical evidence. Their approach presents a credible path for future explorations in domain-informed pre-training, promising enhancements in the precision of AI models across various applications.

PDF Markdown