Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Self-training for Few-shot Transfer Across Extreme Task Differences (2010.07734v2)

Published 15 Oct 2020 in cs.CV, cs.AI, and cs.LG

Abstract: Most few-shot learning techniques are pre-trained on a large, labeled "base dataset". In problem domains where such large labeled datasets are not available for pre-training (e.g., X-ray, satellite images), one must resort to pre-training in a different "source" problem domain (e.g., ImageNet), which can be very different from the desired target task. Traditional few-shot and transfer learning techniques fail in the presence of such extreme differences between the source and target tasks. In this paper, we present a simple and effective solution to tackle this extreme domain gap: self-training a source domain representation on unlabeled data from the target domain. We show that this improves one-shot performance on the target domain by 2.9 points on average on the challenging BSCD-FSL benchmark consisting of datasets from multiple domains. Our code is available at https://github.com/cpphoo/STARTUP.

Citations (102)

Summary

  • The paper introduces STARTUP, a self-training approach leveraging unlabeled target data for few-shot learning across extreme domain gaps.
  • It details a three-step process involving teacher model pre-training, softly-labeled target data generation, and student model training with combined losses.
  • Empirical results show up to a 2.9% improvement in one-shot classification on benchmarks like BSCD-FSL, underscoring its practical benefits in diverse fields.

Self-training for Few-shot Transfer Across Extreme Task Differences

The paper presents a methodological approach for few-shot learning across substantial domain gaps using self-training on unlabeled target domain data. This challenge arises prominently in fields like medical or satellite imagery, where large, labeled datasets are scarce, making typical few-shot learning approaches impractical. Traditionally, few-shot learning methods are pre-trained on a large, labeled base dataset within the same domain, but this is not feasible for diverse domains such as medical imagery or remote sensing data. Here, the domain gap between pre-training datasets like ImageNet and target datasets in novel domains such as X-rays or satellite images is considerable, leading to poor performance with traditional methods.

Proposed Solution: Self-training Across Extreme Domain Gaps

The authors propose “Self Training to Adapt Representations To Unseen Problems,” abbreviated as STARTUP, which leverages the large amounts of unlabeled data available in novel domains to create a useful feature representation for few-shot learning. This approach entails a three-step process:

  1. Learn a teacher model on a base dataset using a standard cross-entropy loss.
  2. Construct a softly-labeled set from the target domain’s unlabeled data using predictions from the teacher model, thereby capturing inherent similarities and distinctions as perceived by the pre-trained model.
  3. Train a student model on both the base and the softly-labeled set using a combination of cross-entropy and KL divergence loss, along with a self-supervised (SimCLR) loss.

The theoretical underpinning of this strategy is that the grouping induced by the teacher model’s predictions on the target domain captures relevant similarities that can inform learning in the target domain. Even though the label spaces of the base and target domains are disjoint, the induced similarities can aid downstream classification tasks.

Empirical Results and Implications

The paper evaluates STARTUP on the challenging BSCD-FSL benchmark, demonstrating its superiority over state-of-the-art few-shot and transfer learning methods, particularly in scenarios with extreme domain differences. STARTUP achieved a notable improvement of up to 2.9 percentage points on one-shot classification tasks over existing methods, with the most significant gains seen in tasks involving the CropDisease dataset.

The exploration into different initializations for the student model showed variable impacts across datasets, indicating that the choice of initialization strategy should be dataset-specific. Furthermore, the role of unlabeled data was underscored, as datasets irrelevant to the target domain didn’t provide benefits, contrary to conventional wisdom in semi-supervised learning contexts. This highlights the domain-specific nature of the task and the necessity for relevant unlabeled data.

Conclusion and Future Directions

The approach presented in this paper underlines the importance of leveraging unlabeled data to overcome the inadequacies of traditional few-shot learning in cross-domain scenarios. STARTUP bridges the gap by adapting representations to novel domains while incorporating meaningful inductive biases from base domain learning. The implications for practical applications are significant, paving the way for more effective deployment of few-shot learning systems in diverse domains with substantial domain differences. Future research could further refine the initialization strategies and explore alternative self-supervised learning methods within the STARTUP framework, possibly enhancing the efficacy across broader domains and task variances.

Github Logo Streamline Icon: https://streamlinehq.com

GitHub