Cross-Pseudo-Labeling in Semi-supervised Learning

Updated 29 May 2026

Cross-pseudo-labeling is a semi-supervised paradigm where multiple models exchange pseudo-labels to mitigate bias and reduce cumulative errors.
It strategically leverages model diversity and confidence thresholds to refine supervision, enhancing accuracy in tasks like detection and speech recognition.
Implementations integrate techniques such as MixUp regularization, entropy weighting, and adaptive loss fusion to achieve robust and effective training.

Cross-pseudo-labeling is a general paradigm in semi-supervised and weakly supervised learning in which multiple models, branches, or agents mutually select or refine pseudo-labels for each other, rather than relying solely on self-generated pseudo-labels. This setup strategically exploits model or view diversity to reduce confirmation bias and accumulative error, and has been instantiated across classification, detection, segmentation, language modeling, in-context learning, audio-visual correspondence, action recognition, and cross-lingual speech recognition. The cross-pseudo-labeling motif appears under distinct naming conventions (e.g., cross-model pseudo-labeling, cross selection, cross-labeling supervision), but all implementations share the foundational property of model mutualism in the pseudo-labeling step.

1. Core Principles and Definitions

Cross-pseudo-labeling distinguishes itself from canonical self-training by leveraging a collaborative or competitive structure: at least two predictive functions (models, subnetworks, branches, or architectures) alternately assign supervision in the form of pseudo-labels to the other. The predictive entities may be identical in architecture or purposely diversified to amplify representational complementarity. The principal rationale is to mitigate undetectable self-errors and confirmation bias that arises when a single model trains exclusively on its own high-confidence outputs (Ma et al., 2022, Yao et al., 2022, Xu et al., 2021).

Formally, given unlabeled data $U$ , two models $f_1, f_2$ generate pseudo-labels $y^*_1, y^*_2$ for each input, typically accepting only those predictions above a confidence threshold. Each model trains on the other's pseudo-labeled data, optionally fusing this with labeled data if available. This exchange can be symmetrically enforced or selectively filtered for further robustness.

A related strategy is cross-task or cross-lingual pseudo-labeling, where a model trained on one domain or language is used to generate initial pseudo-labels on a new domain (as in transfer learning for ASR (Likhomanenko et al., 2023) or in LLM label transfer for ICL (Chen et al., 28 Oct 2025)).

2. Algorithmic Instantiations

Canonical cross-pseudo-labeling is instantiated via multiple architectures:

Dual-network cross-selection (CroSel): Maintains two identical networks with memory banks of historical softmax outputs, using cross-selection criteria (candidate-label set check, stability, confidence) to select pseudo-labels for each other; additionally employs a MixUp-based co-mix consistency term (Tian et al., 2023).
Cross-rectifying in object detection (CrossRectify): Parallel detectors cross-validate proposed bounding boxes using IoU matching and select the higher-confidence peer prediction in case of disagreement, replacing erroneous proposals and preventing reinforcement of self-errors (Ma et al., 2022).
Co-training with within- and cross-group supervision (DFCPS): Two groups of FixMatch-inspired subnetworks, each with weak and strong augmentation branches, iteratively supervise strong augmentations within the group and the weak branch of the other group for medical image segmentation (Chen et al., 2024).
Cross-labeling supervision for semi-supervised learning (CLS): Dual networks generate both positive and negative pseudo-labels (pseudo and complementary), with adaptive weighting based on prediction entropy; each network is additionally trained using its peer’s high-confidence labels (Yao et al., 2022).
Branchwise pseudo-label exchange in video anomaly detection (CPL-VAD): Binary anomaly localization and category classification branches with distinct strengths exchange refined snippet-level pseudo-labels, each branch guiding the other toward better temporal or semantic localization (Dayeon et al., 19 Feb 2026).
Cross-refine mechanism in audio-visual source localization (XPL): Two models with distinct backbones and augmentation schemes generate soft pseudo-labels with exponential moving average smoothing that are used to supervise each other's predictions, enhanced by a curriculum data selection phase based on prediction similarity (Guo et al., 2024).
Cross-model pseudo-labeling in action recognition (CMPL): Two networks (primary backbone and lightweight auxiliary) with different frame rates or spatial configurations use their own confident predictions to supply pseudo-labels for the other's training on strongly augmented data (Xu et al., 2021).
Cross-lingual pseudo-labeling in ASR (CLPL): An acoustic model trained on a source language pseudo-labels unlabeled audio in a target language, using target-language LM rescoring; the pseudo-labeled target data then trains a target AM, with iterative teacher updates (Likhomanenko et al., 2023).
Cross-task pseudo-label transfer for LLM ICL (GraphSim, GLIP): Cross-task labeled data prompt an LLM to pseudo-label a seed subset of a novel task; label propagation via a graph neural network extends pseudo-labels to the full target pool without additional LLM calls (Chen et al., 28 Oct 2025).

3. Pseudo-Label Filtering, Selection, and Quality Assurance

Effective cross-pseudo-labeling requires rigorous strategies to select and filter exchanged pseudo-labels, aiming to maximize precision while maintaining coverage:

Confidence-Based Thresholding: Most frameworks select pseudo-labels only if maximum per-class prediction confidence surpasses a threshold (e.g., τ=0.9 in (Xu et al., 2021, Yao et al., 2022)).
Prediction Stability: Memory banks record sequence of predictions to ensure selection only if outputs remain stable across epochs (Tian et al., 2023).
Entropy/Uncertainty Weighting: Sample weights derived from normalized prediction entropy soften the impact of low-confidence or ambiguous pseudo-labels (Yao et al., 2022).
Peer Disagreement Analysis: CrossRectify leverages pairwise detector disagreements, preferring predictions with higher maximum confidence among matched proposals (Ma et al., 2022).
Structural/Graph-based Similarity: Cross-domain pseudo-labeling employs graph or GNN-based structural similarity to select the closest source examples for seed pseudo-labels and then propagates via label propagation (Chen et al., 28 Oct 2025).
Soft Labeling and Smoothing: Soft pseudo-labels obtained through label sharpening and exponential moving average mechanisms stabilize training and guard against abrupt erroneous label updates (Guo et al., 2024).

4. Training Objectives and Regularization

Most cross-pseudo-labeling methods synthesize loss terms that combine peer-provided pseudo-label supervision with additional regularization, supervised loss, or data consistency constraints:

Cross-Entropy on Cross-Selected or Peer Labels: Each model or branch minimizes the cross-entropy between its output on differently-augmented or new data and the pseudo-label selected by its peer (Yao et al., 2022, Tian et al., 2023).
Consistency Regularization: Techniques such as MixUp or strong/weak augmentation enforce robustness of predictions under intra-sample transformations and between mutually supervising models (Tian et al., 2023, Chen et al., 2024).
Negative/Complementary Learning: CLS integrates negative supervision via complementary labels (argmin channels) to actively push the model away from incorrect predictions (Yao et al., 2022).
Auxiliary Losses and Refinements: Additional terms encourage model similarity, contrastive learning between modalities (XPL), or exploit label propagation consistency (GLIP) (Guo et al., 2024, Chen et al., 28 Oct 2025).

Loss weights for cross-supervision and consistency may be annealed or dynamically adjusted depending on the fraction of high-quality selected pseudo-labels, with late-stage training reducing regularization in favor of direct hard-label supervision (Tian et al., 2023).

5. Empirical Results and Effects

Across a broad array of domains, cross-pseudo-labeling consistently outperforms corresponding single-model or self-pseudo-labeling baselines, with gains attributable to improved pseudo-label quality, diversity, and confirmation bias mitigation:

Image Classification: CLS yields 2–4% gains in accuracy over FixMatch in low-label regimes on CIFAR-10/100 (Yao et al., 2022).
Object Detection: CrossRectify achieves absolute mAP boosts up to +4.7% (Faster-R-CNN on VOC split), surpassing both self-training and state-of-the-art single-teacher frameworks (Ma et al., 2022).
Medical Image Segmentation: DFCPS shows sustained mIoU increases versus single-view consistency and other state-of-the-art segmentation models, particularly under scarce label conditions (Chen et al., 2024).
Video Anomaly Detection: CPL-VAD attains frame-level AP=88.53% (versus 84.51% previously) and higher segmental mAP at all IoU thresholds (Dayeon et al., 19 Feb 2026).
Audio-Visual Source Localization: XPL improves CIoU by 5.3 points on Flickr-SoundNet, outperforming both supervised and vanilla hard pseudo-labeling baselines (Guo et al., 2024).
Action Recognition: CMPL improves Top-1 accuracy by 9–10% over single-model FixMatch in 1%-labeled UCF-101 and Kinetics-400 (Xu et al., 2021).
Unsupervised ASR: Cross-lingual pseudo-labeling reduces WER by 15% absolute compared to end-to-end unsupervised baselines when transferring from German to LJSpeech (Likhomanenko et al., 2023).
In-Context Learning: Cross-task label transfer with graph label propagation for ICL reduces LLM labeling cost by 80% while maintaining performance at the in-task upper bound (Chen et al., 28 Oct 2025).

6. Domain-Specific Adaptations and Extensions

Cross-pseudo-labeling has been tailored to address unique challenges across domains:

Partial-Label Learning: In CroSel, cross-selection exploits memory bank filtration designed to resolve ambiguous candidate-label sets. The integration with MixUp-based soft regularization addresses false selection noise (Tian et al., 2023).
Action and Temporal Domains: Cross-model pseudo-labelers employ differential designs (frame rates, temporal offsets, network width/depth) to maximize the diversity and complementary strengths of peer models (Xu et al., 2021).
Video and Audio-Visual Streams: Cross-pseudo-labeling coordinates between temporally or semantically enriched branches, allowing for accurate boundary localization and semantic disambiguation in weakly-labeled sequence settings (Dayeon et al., 19 Feb 2026, Guo et al., 2024).
Cross-Task/Language Transfer: Aligns source and target domains or languages using structural graph similarity mechanisms or LLM constraints to propagate cross-domain pseudo-labels, minimizing data and compute expense (Likhomanenko et al., 2023, Chen et al., 28 Oct 2025).

7. Limitations, Challenges, and Future Directions

While cross-pseudo-labeling demonstrates clear empirical benefits, reported limitations include sensitivity to pseudo-label quality (especially in cross-lingual or cross-task settings), reliance on model diversity, and possible failure modes where both models produce correlated errors or where domain transfer is weak (e.g., highly dissimilar languages (Likhomanenko et al., 2023)). Advances may lie in refining selection/filtration heuristics, further architectural diversification, integrating structured label propagation, and extending to more general multi-agent or federated systems.

Additionally, research into adaptive exchange protocols, robustness to adversarial labeling errors, and scalable applications to foundation models or multi-modal, structured output spaces remains ongoing. The paradigm remains central in modern semi-supervised, weakly supervised, and transfer learning systems due to its consistent improvements in label efficiency and generalization robustness.