Pushing the Limits of Simple Pipelines for Few-Shot Learning: External Data and Fine-Tuning Make a Difference (2204.07305v1)

Published 15 Apr 2022 in cs.CV and cs.LG

Abstract: Few-shot learning (FSL) is an important and topical problem in computer vision that has motivated extensive research into numerous methods spanning from sophisticated meta-learning methods to simple transfer learning baselines. We seek to push the limits of a simple-but-effective pipeline for more realistic and practical settings of few-shot image classification. To this end, we explore few-shot learning from the perspective of neural network architecture, as well as a three stage pipeline of network updates under different data supplies, where unsupervised external data is considered for pre-training, base categories are used to simulate few-shot tasks for meta-training, and the scarcely labelled data of an novel task is taken for fine-tuning. We investigate questions such as: (1) How pre-training on external data benefits FSL? (2) How state-of-the-art transformer architectures can be exploited? and (3) How fine-tuning mitigates domain shift? Ultimately, we show that a simple transformer-based pipeline yields surprisingly good performance on standard benchmarks such as Mini-ImageNet, CIFAR-FS, CDFSL and Meta-Dataset. Our code and demo are available at https://hushell.github.io/pmf.

Authors (5)

Shell Xu Hu (18 papers)
Da Li (96 papers)
Jan Stühmer (13 papers)
Minyoung Kim (34 papers)
Timothy M. Hospedales (69 papers)

Citations (164)

View on Semantic Scholar

Summary

Evaluation of Simple Pipelines in Few-Shot Learning: The Case for Pre-Training and Fine-Tuning

"Pushing the Limits of Simple Pipelines for Few-Shot Learning: External Data and Fine-Tuning Make a Difference," authored by Hu, Li, Stühmer, Kim, and Hospedales, provides an empirical investigation into leveraging external data and fine-tuning within a simple few-shot learning (FSL) pipeline. The paper seeks to elucidate the implications of pre-training and architecture choices on the performance of FSL, demonstrating that adopting state-of-the-art architectures and pre-training strategies can offer substantial improvements.

Key Contributions

The paper prominently investigates three pivotal considerations in FSL:

Pre-Training on External Data: The research underscores the profound impact of utilizing external data for pre-training. By deploying self-supervised techniques such as DINO, the authors illustrate a notable performance enhancement in downstream FSL tasks. Moreover, the exploration of models pre-trained on extensive datasets like ImageNet or YFCC100M reveals that external data potentially rivals or surpasses the progress made solely within FSL-specific enhancements over the years.
Incorporation of Modern Architectures: With a focus on contemporary architectures, especially vision transformers (ViT), the paper evidences their suitability for FSL, particularly when combined with pre-training on external data. Despite their uncommon usage in FSL previously, the authors argue that such architectures can significantly elevate model performance on standard FSL benchmarks.
Fine-Tuning Strategies: Addressing domain shifts between training and testing phases, the paper presents fine-tuning as an effective strategy for enhancing generalization in novel task distributions. The introduction of a domain-specific learning rate tuning process adapts effectively across various domains, exemplifying the importance of context-aware parameter adjustments during meta-test.

Results and Implications

The empirical results underscore a decisive performance advantage when pre-training is leveraged alongside modern architectures and fine-tuning methodologies. Key highlights include:

Models pre-trained with DINO on ImageNet1K substantially outperformed conventional methods in terms of FSL task accuracy across multiple benchmarks, including Mini-ImageNet and CIFAR-FS.
The application of ViT pre-trained models yielded superior performance, outperforming traditional convolutional networks which have been the standard in earlier FSL research.
Fine-tuning with a task-specific learning rate search refined the few-shot classification capability further, mostly evident in cross-domain scenarios from the Meta-Dataset evaluations.

Future Directions

The paper paves a pathway for refined research in FSL by highlighting the synergistic potential between self-supervised learning advances and FSL methodologies. Future research may focus on:

Investigating additional self-supervised learning frameworks and data modalities to expand the scope of effective pre-training models.
Developing efficient computational strategies for parameter-heavy architectures to make the approaches more accessible for low-resource environments or embedded systems.
Broad evaluations of hybrid SSL and FSL models addressing more diverse and realistic distributions, identifying the nuances in foundational model biases.

Conclusion

By methodically analyzing the foundational aspects of pre-training, architecture selection, and fine-tuning, this research emphasizes the efficiency and effectiveness of simple yet strategically designed FSL pipelines. It challenges the FSL community to rethink conventional methodologies and encourages the integration of broader machine learning advancements into FSL paradigms.

PDF Markdown

Related Papers

Find Related Papers