- The paper demonstrates that integrating self-training with contrastive learning (STOC) yields a 3–8% improvement in accuracy under distribution shifts.
- Theoretical analysis reveals that contrastive learning enhances invariant feature representation, which self-training then exploits to refine predictions.
- Empirical evaluations confirm that the STOC approach outperforms separate methods across benchmark datasets with subpopulation shifts.
Complementary Techniques for Machine Learning Under Distribution Shift
Self-Training and Contrastive Learning
Self-training and contrastive learning are two distinct approaches to improve machine learning models' performance when dealing with distribution shifts - situations where the training data distribution differs from the test data. Self-training involves refining a model by iteratively retraining it on its own predictions. In contrast, contrastive learning focuses on learning representations by enforcing similarity between different views of the same data point while creating dissimilarity with other data points.
Joint Benefit of the Combined Approach
When these two methods are applied separately, they each provide individual benefits. However, this paper explores what happens when they are combined in a process termed Self-Training Over Contrastive learning (STOC). Surprisingly, this combination yields substantial improvements in scenarios where the distribution shifts from training to test data, performing significantly better than when each technique is applied independently. The joint method is found to boost accuracy by 3–8%, which highlights the complementary nature of these methods under distribution shifts.
Theoretical Analysis
To understand this synergy, researchers have investigated it theoretically in a simplified distribution shift model where features that help predictions in the source domain might not be as helpful in the target domain (and vice-versa). From this analysis, it emerges that contrastive learning can, under certain conditions, amplify the beneficial features that are consistent across domains while self-training can capitalize on this improved representation to refine the final predictions. Specifically, contrastive learning boosts features invariant to distribution shifts, creating a better starting point for self-training, which then further enhances out-of-distribution generalization.
Empirical Evaluation
Empirical investigations confirm that contrastive pretraining followed by self-training significantly outperforms other approaches across various benchmark datasets for domain adaption, further cementing the utility of STOC. These experiments are conducted in controlled settings which include datasets designed to undergo subpopulation shifts - a common challenge in practical machine learning scenarios.
Implications and Future Work
The work points to an exciting direction where combining existing techniques can render models more robust to distribution shifts. It poses the possibility of integrating self-training and contrastive learning into a unified training framework which could be a promising path for future research. Additionally, it suggests probing the complementary nature of these techniques beyond the assumptions made around covariate shift for an even broader impact.