Complementary Benefits of Contrastive Learning and Self-Training Under Distribution Shift (2312.03318v1)

Published 6 Dec 2023 in cs.LG, cs.CV, and stat.ML

Abstract: Self-training and contrastive learning have emerged as leading techniques for incorporating unlabeled data, both under distribution shift (unsupervised domain adaptation) and when it is absent (semi-supervised learning). However, despite the popularity and compatibility of these techniques, their efficacy in combination remains unexplored. In this paper, we undertake a systematic empirical investigation of this combination, finding that (i) in domain adaptation settings, self-training and contrastive learning offer significant complementary gains; and (ii) in semi-supervised learning settings, surprisingly, the benefits are not synergistic. Across eight distribution shift datasets (e.g., BREEDs, WILDS), we demonstrate that the combined method obtains 3--8% higher accuracy than either approach independently. We then theoretically analyze these techniques in a simplified model of distribution shift, demonstrating scenarios under which the features produced by contrastive learning can yield a good initialization for self-training to further amplify gains and achieve optimal performance, even when either method alone would fail.

Citations (6)

View on Semantic Scholar

Summary

The paper demonstrates that integrating self-training with contrastive learning (STOC) yields a 3–8% improvement in accuracy under distribution shifts.
Theoretical analysis reveals that contrastive learning enhances invariant feature representation, which self-training then exploits to refine predictions.
Empirical evaluations confirm that the STOC approach outperforms separate methods across benchmark datasets with subpopulation shifts.

Complementary Techniques for Machine Learning Under Distribution Shift

Self-Training and Contrastive Learning

Self-training and contrastive learning are two distinct approaches to improve machine learning models' performance when dealing with distribution shifts - situations where the training data distribution differs from the test data. Self-training involves refining a model by iteratively retraining it on its own predictions. In contrast, contrastive learning focuses on learning representations by enforcing similarity between different views of the same data point while creating dissimilarity with other data points.

Joint Benefit of the Combined Approach

When these two methods are applied separately, they each provide individual benefits. However, this paper explores what happens when they are combined in a process termed Self-Training Over Contrastive learning (STOC). Surprisingly, this combination yields substantial improvements in scenarios where the distribution shifts from training to test data, performing significantly better than when each technique is applied independently. The joint method is found to boost accuracy by 3–8%, which highlights the complementary nature of these methods under distribution shifts.

Theoretical Analysis

To understand this synergy, researchers have investigated it theoretically in a simplified distribution shift model where features that help predictions in the source domain might not be as helpful in the target domain (and vice-versa). From this analysis, it emerges that contrastive learning can, under certain conditions, amplify the beneficial features that are consistent across domains while self-training can capitalize on this improved representation to refine the final predictions. Specifically, contrastive learning boosts features invariant to distribution shifts, creating a better starting point for self-training, which then further enhances out-of-distribution generalization.

Empirical Evaluation

Empirical investigations confirm that contrastive pretraining followed by self-training significantly outperforms other approaches across various benchmark datasets for domain adaption, further cementing the utility of STOC. These experiments are conducted in controlled settings which include datasets designed to undergo subpopulation shifts - a common challenge in practical machine learning scenarios.

Implications and Future Work

The work points to an exciting direction where combining existing techniques can render models more robust to distribution shifts. It poses the possibility of integrating self-training and contrastive learning into a unified training framework which could be a promising path for future research. Additionally, it suggests probing the complementary nature of these techniques beyond the assumptions made around covariate shift for an even broader impact.

PDF Markdown

Related Papers

Tweets

https://twitter.com/2942501034/status/1733937871826936068