Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Baseline for Few-Shot Image Classification (1909.02729v5)

Published 6 Sep 2019 in cs.LG, cs.CV, and stat.ML

Abstract: Fine-tuning a deep network trained with the standard cross-entropy loss is a strong baseline for few-shot learning. When fine-tuned transductively, this outperforms the current state-of-the-art on standard datasets such as Mini-ImageNet, Tiered-ImageNet, CIFAR-FS and FC-100 with the same hyper-parameters. The simplicity of this approach enables us to demonstrate the first few-shot learning results on the ImageNet-21k dataset. We find that using a large number of meta-training classes results in high few-shot accuracies even for a large number of few-shot classes. We do not advocate our approach as the solution for few-shot learning, but simply use the results to highlight limitations of current benchmarks and few-shot protocols. We perform extensive studies on benchmark datasets to propose a metric that quantifies the "hardness" of a few-shot episode. This metric can be used to report the performance of few-shot algorithms in a more systematic way.

Few-Shot Image Classification Through Transductive Fine-Tuning

The paper "A Baseline for Few-Shot Image Classification" presents a systematic approach to few-shot learning by advocating for transductive fine-tuning as a robust and straightforward baseline. This method challenges the intricate models dominating the few-shot learning space, demonstrating that simplicity paired with careful design choices yields competitive or superior results.

Overview

The authors propose that fine-tuning a deep neural network, initially trained using cross-entropy loss, provides strong performance in few-shot learning scenarios. Specifically, this performance is enhanced when employing transductive fine-tuning—where the information from the test samples is utilized during inference. The approach outperforms state-of-the-art methods on standard datasets such as Mini-ImageNet and Tiered-ImageNet.

Key Contributions

  1. Transductive Fine-Tuning: The paper introduces a baseline that leverages unlabeled test data during fine-tuning, optimizing a model trained on a separate meta-training dataset. This involves adapting both the classifier and the feature extractor using information from the specific task at hand.
  2. Support-Based Initialization: Drawing from deep metric learning, the paper suggests initializing the classifier weights using support samples, thus maximizing cosine similarity between class weights and sample features.
  3. Benchmark Results: Conducting experiments across popular few-shot datasets, the authors demonstrate that their method surpasses existing benchmarks without needing specialized training per dataset or few-shot protocol.
  4. Scalability: The method has been tested on large-scale datasets like ImageNet-21k, illustrating its feasibility and robustness in few-shot scenarios involving significant data.

Numerical Results and Claims

The proposed approach achieves notable accuracies, such as 68.11% on the 1-shot, 5-way Mini-ImageNet task, significantly higher than existing methods. Achieving 58.04% in the 5-shot, 20-way scenario on ImageNet-21k, it showcases exceptional applicability to large-scale tasks.

Theoretical and Practical Implications

Theoretically, this paper challenges the perceived necessity of complex meta-learning algorithms, suggesting that improvements may instead derive from leveraging traditional supervised learning techniques alongside transductive methods. Practically, it opens new avenues for few-shot systems by emphasizing simplicity, robustness, and efficiency, particularly in dealing with vast and heterogeneous data sets.

Future Directions

The implications of transductive fine-tuning suggest potential exploration into hybrid models combining transduction with other semi-supervised learning techniques. Additionally, tweaking hyperparameters per dataset while maintaining a baseline configuration across multiple datasets could further enhance performance without over-specialization.

Conclusion

The paper advocates a reevaluation of the landscape of few-shot learning, asserting that simplicity empowered by transductive learning offers a reliable and scalable baseline. This approach not only underscores the potential advantages of straightforward techniques but also facilitates a better understanding of the inherent challenges and the true efficacy of emerging few-shot learning algorithms.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Guneet S. Dhillon (6 papers)
  2. Pratik Chaudhari (75 papers)
  3. Avinash Ravichandran (35 papers)
  4. Stefano Soatto (179 papers)
Citations (550)
Youtube Logo Streamline Icon: https://streamlinehq.com