Debiased Learning from Naturally Imbalanced Pseudo-Labels (2201.01490v2)

Published 5 Jan 2022 in cs.LG, cs.CL, and cs.CV

Abstract: Pseudo-labels are confident predictions made on unlabeled target data by a classifier trained on labeled source data. They are widely used for adapting a model to unlabeled data, e.g., in a semi-supervised learning setting. Our key insight is that pseudo-labels are naturally imbalanced due to intrinsic data similarity, even when a model is trained on balanced source data and evaluated on balanced target data. If we address this previously unknown imbalanced classification problem arising from pseudo-labels instead of ground-truth training labels, we could remove model biases towards false majorities created by pseudo-labels. We propose a novel and effective debiased learning method with pseudo-labels, based on counterfactual reasoning and adaptive margins: The former removes the classifier response bias, whereas the latter adjusts the margin of each class according to the imbalance of pseudo-labels. Validated by extensive experimentation, our simple debiased learning delivers significant accuracy gains over the state-of-the-art on ImageNet-1K: 26% for semi-supervised learning with 0.2% annotations and 9% for zero-shot learning. Our code is available at: https://github.com/frank-xwang/debiased-pseudo-labeling.

Authors (4)

Xudong Wang (113 papers)
Zhirong Wu (31 papers)
Long Lian (16 papers)
Stella X. Yu (65 papers)

Citations (78)

View on Semantic Scholar

Summary

Insights into "Debiased Pseudo Labeling for Zero-shot and Semi-Supervised Learning"

The paper "Debiased Pseudo-labeling for Zero-shot and Semi-Supervised Learning" addresses a fundamental issue in machine learning, specifically the inherent bias in pseudo-labeling, which is not widely recognized or deeply investigated in the current literature. Pseudo-labeling is a common technique employed in scenarios where a model is trained on labeled source data and then applied to unlabeled target data—typified in semi-supervised learning (SSL) and zero-shot learning (ZSL). However, pseudo-labels are often naturally imbalanced, causing systemic biases that affect the generalization of models.

The authors propose a novel framework, Debiased Pseudo Labeling (DebiasPL), to mitigate the bias arising from pseudo-labels. The approach incorporates counterfactual reasoning and adaptive margins to correct classifier biases and dynamically adjust the margins of each class according to the pseudo-label distribution imbalance. This technique is shown to yield significant improvements in both semi-supervised and zero-shot learning tasks by aligning classifier learning with true data distribution, rather than with the skewed pseudo-label distributions that conventionally persist.

Key Contributions

Measurement of Pseudo-label Bias: The paper systematically demonstrates that pseudo-labels are inherently imbalanced, imputing bias into models. This bias remains substantial even when trained and evaluated on balanced datasets, impacting both SSL and ZSL methodologies.
Debiased Learning Framework: The authors propose an innovative debiased learning framework that uses counterfactual reasoning to reduce classifier bias and adaptive marginal loss to manage inter-class imbalances. These enhancements allow for more accurate predictions without requiring prior knowledge of the ground-truth distribution margins.
Empirical Validation and Robustness: Extensive experimentation reveals that DebiasPL significantly outperforms existing methods. On ImageNet-1K, accuracy improvements of 26% and 9% for SSL and ZSL tasks, respectively, demonstrate this framework's efficacy. These results suggest that DebiasPL is effective across varied model architectures and datasets.
Applicability to Long-tailed Situations: The paper highlights improvements in SSL performance in long-tailed settings where data is naturally imbalanced, thus demonstrating that DebiasPL performs well without requiring explicit distribution assumptions.
New Pipeline for Vision-LLMs: The framework establishes a new effective pipeline for combining with vision-and-language pre-trained models like CLIP, facilitating superior zero-shot learning performance by further mitigating the imbalanced influence from pseudo-labels.

Numerical Results and Implications

The paper provides strong numerical results that reinforce the utility of addressing pseudo-label imbalances. This is critical given that traditional methods often overlook these biases, potentially limiting broader model applicability in real-world, long-tailed, and cross-domain scenarios. The authors report substantial gains in accuracy across heterogeneous tasks and datasets, validating DebiasPL's potential as an augmentation to existing and future learning models.

Practically, DebiasPL could mitigate biases in enterprise applications, improving model robustness, fairness, and ethical AI considerations. Theoretically, it expands the dialogue on pseudo-labeling, encouraging further exploration into causal inference methods and adaptive algorithms to tackle learning biases.

Future Directions

While DebiasPL demonstrates impressive results, future research might explore integrating this framework more directly with different neural architectures, evaluating its adaptability further across varying domains. An extension of this method may involve an exploration of bias compensation in active learning scenarios or its application to the nascent field of self-supervised learning where labeled data is sparse and often naturally imbalanced.

In conclusion, this research solidifies the importance of correcting pseudo-label biases to enhance model performance in scenarios with limited or imbalanced data. The DebiasPL framework appears as a substantial step forward in refined model training, setting a benchmark for future studies in pseudo-labeling and debiasing methodologies.

Related Papers

Find Related Papers

GitHub

GitHub - frank-xwang/debiased-pseudo-labeling: [CVPR 2022] Pytorch implementation for “Debiased Learning from Naturally Imbalanced Pseudo-Labels” (96 stars)