A Baseline for Few-Shot Image Classification (1909.02729v5)

Published 6 Sep 2019 in cs.LG, cs.CV, and stat.ML

Abstract: Fine-tuning a deep network trained with the standard cross-entropy loss is a strong baseline for few-shot learning. When fine-tuned transductively, this outperforms the current state-of-the-art on standard datasets such as Mini-ImageNet, Tiered-ImageNet, CIFAR-FS and FC-100 with the same hyper-parameters. The simplicity of this approach enables us to demonstrate the first few-shot learning results on the ImageNet-21k dataset. We find that using a large number of meta-training classes results in high few-shot accuracies even for a large number of few-shot classes. We do not advocate our approach as the solution for few-shot learning, but simply use the results to highlight limitations of current benchmarks and few-shot protocols. We perform extensive studies on benchmark datasets to propose a metric that quantifies the "hardness" of a few-shot episode. This metric can be used to report the performance of few-shot algorithms in a more systematic way.

PDF Abstract

Few-Shot Image Classification Through Transductive Fine-Tuning

The paper "A Baseline for Few-Shot Image Classification" presents a systematic approach to few-shot learning by advocating for transductive fine-tuning as a robust and straightforward baseline. This method challenges the intricate models dominating the few-shot learning space, demonstrating that simplicity paired with careful design choices yields competitive or superior results.

Overview

The authors propose that fine-tuning a deep neural network, initially trained using cross-entropy loss, provides strong performance in few-shot learning scenarios. Specifically, this performance is enhanced when employing transductive fine-tuning—where the information from the test samples is utilized during inference. The approach outperforms state-of-the-art methods on standard datasets such as Mini-ImageNet and Tiered-ImageNet.

Key Contributions

Transductive Fine-Tuning: The paper introduces a baseline that leverages unlabeled test data during fine-tuning, optimizing a model trained on a separate meta-training dataset. This involves adapting both the classifier and the feature extractor using information from the specific task at hand.
Support-Based Initialization: Drawing from deep metric learning, the paper suggests initializing the classifier weights using support samples, thus maximizing cosine similarity between class weights and sample features.
Benchmark Results: Conducting experiments across popular few-shot datasets, the authors demonstrate that their method surpasses existing benchmarks without needing specialized training per dataset or few-shot protocol.
Scalability: The method has been tested on large-scale datasets like ImageNet-21k, illustrating its feasibility and robustness in few-shot scenarios involving significant data.

Numerical Results and Claims

The proposed approach achieves notable accuracies, such as 68.11% on the 1-shot, 5-way Mini-ImageNet task, significantly higher than existing methods. Achieving 58.04% in the 5-shot, 20-way scenario on ImageNet-21k, it showcases exceptional applicability to large-scale tasks.

Theoretical and Practical Implications

Theoretically, this paper challenges the perceived necessity of complex meta-learning algorithms, suggesting that improvements may instead derive from leveraging traditional supervised learning techniques alongside transductive methods. Practically, it opens new avenues for few-shot systems by emphasizing simplicity, robustness, and efficiency, particularly in dealing with vast and heterogeneous data sets.

Future Directions

The implications of transductive fine-tuning suggest potential exploration into hybrid models combining transduction with other semi-supervised learning techniques. Additionally, tweaking hyperparameters per dataset while maintaining a baseline configuration across multiple datasets could further enhance performance without over-specialization.

Conclusion

The paper advocates a reevaluation of the landscape of few-shot learning, asserting that simplicity empowered by transductive learning offers a reliable and scalable baseline. This approach not only underscores the potential advantages of straightforward techniques but also facilitates a better understanding of the inherent challenges and the true efficacy of emerging few-shot learning algorithms.

PDF Markdown Bookmark Chat (Pro)

Authors (4)

Guneet S. Dhillon (6 papers)
Pratik Chaudhari (75 papers)
Avinash Ravichandran (35 papers)
Stefano Soatto (179 papers)

Citations (550)

View on Semantic Scholar

Related Papers

Find Related Papers

YouTube

Show All Videos