Noisy Pseudo-Label Filtering

Updated 26 November 2025

Noisy pseudo-label filtering is a set of strategies that detects, filters, and refines self-generated labels to mitigate prediction errors and confirmation bias.
The methodology encompasses confidence thresholding, adaptive mixture modeling, and uncertainty-based techniques to selectively reweight training samples.
Applications span image classification, semantic segmentation, domain adaptation, and temporal action localization, contributing significantly to robust learning.

A noisy pseudo-label filter is a set of algorithmic strategies for identifying, filtering, refining, or reweighting pseudo-labels tainted with prediction errors in semi-supervised, weakly supervised, and learning-with-noisy-labels contexts. Pseudo-label generation, a central component of modern SSL and robust learning pipelines, self-assigns proxy labels to unlabelled or weakly labelled examples using a network’s predictions or a teacher model. Because these pseudo-labels are imperfect—particularly under high noise, domain shift, or low data regimes—specialized mechanisms are required to suppress, remove, or adaptively down-weight the influence of noisy pseudo-labels during training. The research landscape on this topic has rapidly evolved, encompassing confidence-based thresholding, adaptive mixture modeling, uncertainty quantification, neighbor-consensus schemes, contrastive embedding strategies, as well as loss shaping and sample division methods. These approaches aim to maximize the effectiveness of self-training while minimizing confirmation bias and error propagation intrinsic to noisy pseudo-supervision.

1. Core Principles and Motivation

Noisy pseudo-label filtering addresses the challenge of confirmation bias when learning from self-generated labels under imperfect supervision. In settings where ground-truth labels are missing, incomplete, or heavily corrupted, the model produces pseudo-labels that can be correct or noisy. Early works relied on simple confidence thresholding—retaining pseudo-labels with maximum softmax scores above a fixed value—yet found this approach brittle: discarding too many correct labels early and letting through many incorrect labels later.

Contemporary research shows that:

Confidence alone is only moderately predictive of correctness; many medium-confidence pseudo-labels are correct, and discarding them discards useful information (Scherer et al., 2022).
Adaptive, model-evolving schemes (e.g., mixture modeling of confidence distributions) yield more robust separation between correct and noisy pseudo-labels (Zhu et al., 2023).
Filtering must balance the risk of overfitting to noise (if too permissive) and underfitting (if too strict).

This framework is now applied in a variety of domains, including image classification under heavy label noise, semantic segmentation, unsupervised domain adaptation, temporal action localization, speech recognition, graph-based domain transfer, and cross-modal retrieval.

2. Algorithmic Taxonomy of Noisy Pseudo-Label Filters

Filtering mechanisms can be organized as follows:

Confidence Thresholding: Discards pseudo-labels with softmax confidence below a fixed or adaptively chosen threshold. E.g., depends on the maximum predicted class probability per sample or per-pixel (Scherer et al., 2022, Jin et al., 2022). In semantic segmentation, per-class thresholds or decile-based discards are also applied.
Mixture Model–Based Adaptive Thresholding: Fits a parametric mixture (e.g. Beta or Gaussian Mixture Model) to the distribution of pseudo-label confidences and assigns each sample a filtering weight or posterior of being "correct." This soft-weights, rather than discards, labels and adapts as model improves (Zhu et al., 2023, Ortego et al., 2019).
Uncertainty-Based Filtering: Utilizes uncertainty metrics such as predictive entropy, probability margin, variation ratio, or MC-Dropout variance, and retains only those pseudo-labels that are both high-confidence and low-uncertainty (Rizve et al., 2021).
Neighborhood and Consensus Filtering: Calculates the agreement of pseudo-labels with those of feature-space nearest neighbors to smooth out isolated errors and leverages robust prototype assignment to correct noisy clusters (Yin et al., 9 May 2024, Chen et al., 2023, Saravanan et al., 7 Feb 2024).
Energy-Based and Loss-Based Filters: Directly leverages model energy (a free-energy/sum-logit measure) or cross-entropy loss for sample selection. Global and (per-class) adaptive energy/loss thresholds distinguish inliers from outliers or noisy from clean (Meng et al., 23 Apr 2025, Ortego et al., 2019).
Noisy Pseudo-Label Calibration: Uses auxiliary modules (e.g., top-K high-reliability cluster members, context-aware corrections, or history-based priors) to re-assign pseudo-labels based on robust prototypes, temporal context, or sample prediction histories (Yin et al., 9 May 2024, Zhang et al., 19 Jan 2025, Chen et al., 2021).
Consistency and Contrastive Re-weighting: Employs consistency scores (e.g., epochwise class prediction stability or pseudo-class margin) to separate ambiguous from refinable pseudo-labels (Liu et al., 19 Sep 2025), and sometimes refines hard samples via contrastive instance discrimination (Meng et al., 23 Apr 2025, Liu et al., 27 Feb 2024).

3. Mathematical Formulations and Representative Algorithms

A selection of the most prevalent mathematical definitions:

Threshold-Based Filtering $\mathcal{S}_\tau = \{ x : \max_{k} p(x)_k \geq \tau \}$ Only samples $x$ with maximum class probability above threshold $\tau$ are retained for self-training (Goel et al., 2022).
Mixture Model Weighting Fit a probabilistic mixture $p(c) = \pi_1 p_1(c;\theta_1) + \pi_2 p_2(c;\theta_2)$ to pseudo-label confidences $c$ , and use $w_i = \frac{\pi_{\text{good}} p_{\text{good}}(c_i)}{\sum_j \pi_j p_j(c_i)}$ as a soft weight in the unsupervised loss (Zhu et al., 2023).
Uncertainty-Aware Selection A positive pseudo-label is selected if $p_c(x) \geq \tau_p$ and $u(p_c(x)) \leq \kappa_p$ , with $u$ being an uncertainty metric (Rizve et al., 2021).
Energy-Based Filtering Retain $x$ if $E(x) = \log \sum_{k} \exp(f_k(x)) < \tau_{\text{joint}}(k)$ with threshold built from global/class-specific EMA (Meng et al., 23 Apr 2025).
Prototype Calibration For cluster $c$ , robust prototype is $p_c = \frac{1}{K} \sum_{i \in I_c} f_{c,i}$ , where $I_c$ index the top-K intra-cluster members by neighbor-graph connectivity (Yin et al., 9 May 2024).
Noise Correction Loss For noisy mask $g$ with prediction $p$ , $\mathcal{L}_{NC}(p,g; q) = \frac{\sum_{i}|p_i-g_i|^q}{\sum_{i} (p_i + g_i) - \sum_{i} p_i g_i}$ where $q$ is scheduled between 2 and 1 to control noise sensitivity (Zhang et al., 18 Jul 2024).
Consistency/Reweighting Pseudo-label Consistency Score: $\mathrm{PCS}_i = n_{\max} - n_{\text{second}}$ where $n_c$ is the number of epochs class $c$ was predicted for sample $i$ (Liu et al., 19 Sep 2025).

Pseudocode and training loops follow these formulations, with multi-stage operation (warmup, selection, filtering, self-training), sample division (split into clean/ambiguous/noisy), auxiliary modules (consistency discriminator, context-averaged boundaries), and iterative refinement.

4. Key Applications and Domains

Noisy pseudo-label filter techniques are now integral to multiple learning paradigms:

Robust Learning with Noisy Labels: Disentangles clean and noisy samples, facilitating SSL on "clean" examples while adaptively filtering and relabeling noisy ones (Goel et al., 2022, Chen et al., 2021, Liu et al., 27 Feb 2024).
Semi-Supervised Learning (SSL): SSL pipelines such as FixMatch, MeanTeacher, MixMatch, and their successors deploy noisy pseudo-label filters to exploit large unlabeled sets safely (Zhu et al., 2023, Scherer et al., 2022).
Domain Adaptation and Transfer: Both source-free UDA and graph domain adaptation methods filter out unreliable pseudo-labels to avoid model drift (Meng et al., 23 Apr 2025, Chen et al., 17 Mar 2024, Wang et al., 1 Aug 2025).
Semantic Segmentation and Structured Prediction: In semantic segmentation, per-pixel confidence, soft weighting, and masking suppress noise at high spatial resolution; evaluated through precision, recall, and mIoU metrics (Scherer et al., 2022).
Temporal Action Localization and Speech Recognition: In action localization and ASR, joint modeling of localization/classification and context-corrected filtering have demonstrated large improvements under partial or noisy supervision (Zhang et al., 19 Jan 2025, Zhou et al., 10 Jul 2024, Jin et al., 2022).
Cross-Modality and Cross-Instance Matching: Person re-identification and cross-modal retrieval utilize prototype calibration, neighbor-graph consensus, and pseudo-label consistency to build noise-robust correspondence and matching structures (Yin et al., 9 May 2024, Liu et al., 19 Sep 2025).

5. Empirical Benchmarks and Performance Insights

The empirical literature establishes several key trends:

Accuracy Gains Under High Noise: PARS improves test accuracy by +12 pp (absolute) on CIFAR-100 with 90% symmetric label noise and +27 pp under strong low-resource restrictions (Goel et al., 2022). PGDF achieves marked increases in clean-sample retention and test accuracy by using history-based priors to rescue hard samples (Chen et al., 2021).
Stability Across Data Regimes/Distributions: Distribution-robust filters using BMM or GMM adapt well to non-uniform, out-of-distribution, and instance-dependent noise, surpassing traditional small-loss or confidence-only heuristics (Ortego et al., 2019, Zhu et al., 2023).
Adaptive and Soft Weighting Superiority: Soft, probabilistic, or continuous weighting of pseudo-labels, as opposed to hard masking, consistently outperforms static cutoff rules, particularly as the model's calibration drifts across training (Zhu et al., 2023, Scherer et al., 2022, Rizve et al., 2021).
Critical Role of Filtering for SSL/Domain Adaptation: All leading SFDA, SSL, and clustering-based UDA approaches integrate pseudo-label filtering. E.g., EBPR’s energy-based filter plus contrastive hard-sample refinement outperforms prior SFDA methods by 0.3–1.0pp absolute on Office-31, Office-Home, and VisDA-C (Meng et al., 23 Apr 2025).
Sample Division Granularity Matters: Multi-stage splits ("clean," "ambiguous"/"hard," and "noisy") allow more nuanced sample utilization, with reweighting and relabeling—rather than outright discarding—yielding the strongest performance (Chen et al., 2021, Scherer et al., 2022, Yin et al., 9 May 2024).
Cross-Task and Cross-Domain Transferability: Pseudo-label filtering’s principles have generalized success across image, text, audio, temporal, and graph data, suggesting broad applicability of the core algorithmic principles.

6. Limitations, Practical Considerations, and Future Directions

Challenges and ongoing avenues include:

Misalignment and Confirmation Bias: Overreliance on model-generated predictions risks reinforcing initial mistakes. Approaches such as co-teaching, dual-branch decoupling, or neighbor-consistency seek to mitigate this effect (Chen et al., 2021, Wang et al., 1 Aug 2025).
Computational Overhead: While mixture modeling and neighbor search add some overhead (e.g., <2% in (Zhu et al., 2023)), reliance on history-based or cascade modules can increase total training cost, as observed in prior generation modules (Chen et al., 2021).
Threshold and Hyperparameter Sensitivity: Adaptive, data-driven thresholding is essential, as fixed hyperparameters often fail in practice. Some methods propose using training dynamics or even meta-learning to set thresholds (Zhu et al., 2023, Zhou et al., 10 Jul 2024).
Retrieval and Prototype Anchoring: Label-retrieval-augmented and neighbor-prototype approaches require robust pretrained features or large-scale retrieval infrastructure (Chen et al., 2023, Yin et al., 9 May 2024).
Handling Open-Set and Instance-Dependent Noise: Future work aims to extend current filters to open-set or class-dependent noise settings (Chen et al., 2021).
Generalization to Highly Structured Outputs: Filtering strategies for highly structured, high-dimensional outputs (e.g., segmentation masks, spatiotemporal tubes) need further research, particularly around permeable or local pseudo-label correction (Scherer et al., 2022, Zhang et al., 19 Jan 2025, Zhang et al., 18 Jul 2024).
Unified Frameworks: Combining aspects such as uncertainty, neighborhood-consensus, and adaptive weighting within a single end-to-end pluggable filter remains an active area, as evidenced by the proliferation of hybrid and multi-module solutions (Liu et al., 27 Feb 2024, Liu et al., 19 Sep 2025).

7. Summary Table: Notable Noisy Pseudo-Label Filters and Their Core Mechanisms

Approach	Filtering Principle	Key Dataset / Result
PARS (Goel et al., 2022)	Confidence threshold + robust loss	CIFAR-100@90%: +12pp accuracy
SPF (Zhu et al., 2023)	Mixture model (Beta/Gauss) posterior weights	CIFAR-10/100, mini-ImageNet
UPS (Rizve et al., 2021)	High-confidence + low-uncertainty	CIFAR-10: +23pp vs. plain PL
PGDF (Chen et al., 2021)	Sample history prior + GMM	Retains >17% more clean samples
EBPR (Meng et al., 23 Apr 2025)	Clustered energy level EMA	Office-Home, VisDA-C SOTA
PCSR (Liu et al., 19 Sep 2025)	GMM + epochwise consistency score	Flickr30K@80% noise: +4.4pp R@1
Noisy Pseudo-label Calibration (Yin et al., 9 May 2024)	Neighbor-consensus prototype	SYSU-MM01: +10.3% Rank-1
DRPL (Ortego et al., 2019)	Online relabeling + BMM	Robust to OOD/non-uniform noise
APL (Zhou et al., 10 Jul 2024)	Joint class/loc score + ICD	+4–6pp mAP on THUMOS14
SARI (Saravanan et al., 7 Feb 2024)	KNN voting + reliable set curation	CIFAR/CUB crowd: +5pp over prior

References

"PARS: Pseudo-Label Aware Robust Sample Selection for Learning with Noisy Labels" (Goel et al., 2022)
"Towards Self-Adaptive Pseudo-Label Filtering for Semi-Supervised Learning" (Zhu et al., 2023)
"Pseudo-Label Noise Suppression Techniques for Semi-Supervised Semantic Segmentation" (Scherer et al., 2022)
"In Defense of Pseudo-Labeling: An Uncertainty-Aware Pseudo-label Selection Framework for Semi-Supervised Learning" (Rizve et al., 2021)
"Sample Prior Guided Robust Model Learning to Suppress Noisy Labels" (Chen et al., 2021)
"Energy-Based Pseudo-Label Refining for Source-free Domain Adaptation" (Meng et al., 23 Apr 2025)
"Nested Graph Pseudo-Label Refinement for Noisy Label Domain Adaptation Learning" (Wang et al., 1 Aug 2025)
"Label-Retrieval-Augmented Diffusion Models for Learning from Noisy Labels" (Chen et al., 2023)
"Rethinking Pseudo-Label Guided Learning for Weakly Supervised Temporal Action Localization from the Perspective of Noise Correction" (Zhang et al., 19 Jan 2025)
"Robust Pseudo-label Learning with Neighbor Relation for Unsupervised Visible-Infrared Person Re-Identification" (Yin et al., 9 May 2024)
"Pseudo-labelling meets Label Smoothing for Noisy Partial Label Learning" (Saravanan et al., 7 Feb 2024)
"Uncertainty-Aware Pseudo-Label Filtering for Source-Free Unsupervised Domain Adaptation" (Chen et al., 17 Mar 2024)
"PLReMix: Combating Noisy Labels with Pseudo-Label Relaxed Contrastive Representation Learning" (Liu et al., 27 Feb 2024)
"Towards Robust Learning with Different Label Noise Distributions" (Ortego et al., 2019)
"PCSR: Pseudo-label Consistency-Guided Sample Refinement for Noisy Correspondence Learning" (Liu et al., 19 Sep 2025)
"Towards Adaptive Pseudo-label Learning for Semi-Supervised Temporal Action Localization" (Zhou et al., 10 Jul 2024)
"Learning Camouflaged Object Detection from Noisy Pseudo Label" (Zhang et al., 18 Jul 2024)