Is Large-Scale Pretraining the Secret to Good Domain Generalization? (2412.02856v3)

Published 3 Dec 2024 in cs.CV and cs.LG

Abstract: Multi-Source Domain Generalization (DG) is the task of training on multiple source domains and achieving high classification performance on unseen target domains. Recent methods combine robust features from web-scale pretrained backbones with new features learned from source data, and this has dramatically improved benchmark results. However, it remains unclear if DG finetuning methods are becoming better over time, or if improved benchmark performance is simply an artifact of stronger pre-training. Prior studies have shown that perceptual similarity to pre-training data correlates with zero-shot performance, but we find the effect limited in the DG setting. Instead, we posit that having perceptually similar data in pretraining is not enough; and that it is how well these data were learned that determines performance. This leads us to introduce the Alignment Hypothesis, which states that the final DG performance will be high if and only if alignment of image and class label text embeddings is high. Our experiments confirm the Alignment Hypothesis is true, and we use it as an analysis tool of existing DG methods evaluated on DomainBed datasets by splitting evaluation data into In-pretraining (IP) and Out-of-pretraining (OOP). We show that all evaluated DG methods struggle on DomainBed-OOP, while recent methods excel on DomainBed-IP. Put together, our findings highlight the need for DG methods which can generalize beyond pretraining alignment.

Summary

The paper introduces the Alignment Hypothesis, suggesting that high alignment between image features and class text embeddings during pre-training is key to Domain Generalization (DG) success.
Utilizing novel DomainBed-IP/OOP data splits based on pre-training alignment, the study reveals that current DG methods perform well on In-Pretraining data but struggle significantly on Out-of-Pretraining samples.
Findings indicate that state-of-the-art DG methods heavily rely on pre-trained feature alignment and do not consistently outperform older methods on challenging low-alignment data, highlighting the need for new techniques.
meta_description
This paper investigates whether recent advancements in Domain Generalization stem from novel methods or the increased power of large-scale pre-trained models.
title
Large-Scale Pretraining and Domain Generalization

The paper "Is Large-scale Pretraining the Secret to Good Domain Generalization?" explores the intricacies of Multi-Source Domain Generalization (DG), specifically examining whether recent advancements in DG are a result of genuine methodological progress or merely an artifact of increasingly robust large-scale pre-trained models. The paper critically evaluates current DG techniques, which often fine-tune models grounded in substantial pre-training, such as those employing CLIP-based architectures, across multiple source domains to perform well on unseen target domains.

Core contributions of this work include:

Alignment Hypothesis: The paper introduces the Alignment Hypothesis, positing that DG success hinges on high alignment between image features and class label text embeddings achieved during pre-training. The authors argue this alignment is a more robust predictor of DG performance than previously suggested measures, such as perceptual similarity between pre-training and target datasets. This hypothesis is supported by experimental data showing that alignment scores correlate strongly with DG effectiveness, particularly indicating that low alignment typically results in suboptimal performance post-fine-tuning.
DomainBed-IP/OOP Splits: Utilizing the Alignment Hypothesis, the authors propose a novel evaluation framework that segments data into In-Pretraining (IP) and Out-of-Pretraining (OOP) subsets based on their pre-training alignment scores. This bifurcation enables a granular analysis of DG methods, revealing that while methods excel on IP data, they struggle considerably with OOP samples, which are characterized by weak alignment in pre-training.
Benchmarking DG Methods: The paper conducts extensive benchmarking of state-of-the-art DG methods on the DomainBed datasets, revealing that these methods achieve near-oracle performance on IP data but face significant challenges on OOP data. Surprisingly, recent methods do not consistently surpass older benchmarks on OOP splits, underscoring a prevalent reliance on pre-trained feature alignment.
Need for Robust DG Techniques: The findings underscore a critical challenge for future research: to design DG methods capable of transcending the limitations imposed by pre-training alignment. The paper suggests that methods which enhance feature generalization capabilities beyond those in pre-trained models are imperative, particularly for tackling the low-alignment challenge posed by OOP data.

This meticulous examination reveals that while current DG methods demonstrate impressive results under certain conditions, there is a significant gap in performance when it comes to generalizing from data that pre-training models do not align well. The authors advocate for continued investigation into improving the inherent capabilities of DG methods to learn robust, domain-invariant features, especially for less aligned, out-of-pretraining samples.

PDF Markdown

Is Large-Scale Pretraining the Secret to Good Domain Generalization? (2412.02856v3)

Summary

Related Papers

Tweets