Online Pseudo Labeling

Updated 25 May 2026

Online pseudo-labeling is a framework that dynamically generates and refines surrogate labels for unlabeled data during model training, enabling real-time adaptation to nonstationary distributions.
It employs teacher-student methodologies with momentum updates and regret minimization to maintain pseudo-label quality and handle domain shifts effectively.
Empirical applications in ASR, object detection, and test-time adaptation demonstrate significant improvements in accuracy, scalability, and memory efficiency.

Online pseudo-labeling is a framework for generating and updating surrogate labels for unlabeled data samples dynamically during the training of machine learning models. This paradigm is central to modern semi-supervised learning, online domain adaptation, test-time adaptation, and learning-from-weak-label scenarios. Online pseudo-labeling algorithms generate, refine, and leverage pseudo-labels on-the-fly—often via an auxiliary model (teacher), statistical heuristics, or explicit regularization—allowing models to exploit the structure of unlabeled data with high computational and memory efficiency.

1. Conceptual Foundations

Online pseudo-labeling operates in contrast to traditional offline or batch pseudo-labeling, where a fixed or periodically re-trained model generates labels for all unlabeled data before or between main training epochs. In the online setting, pseudo-labels are generated and re-evaluated at each mini-batch or epoch, often by leveraging the current or a momentum-averaged model. This enables continuous adaptation and real-time handling of nonstationary distributions, label distribution drifts, or streaming data scenarios.

Multiple works have formalized and validated this approach in diverse settings:

In semi-supervised automatic speech recognition (ASR), online pseudo-labeling via mean teacher or momentum ensembles substantially improves scalability and domain robustness (Higuchi et al., 2021, Higuchi et al., 2021).
For universal domain adaptation with streaming target data and no source access, online pseudo-labeling is essential for maintaining accuracy under category shift, memory constraints, and continuous domain evolution (Schlachter et al., 2024, Schlachter et al., 16 Apr 2025).

2. Algorithmic Structures and Update Mechanisms

Two dominant algorithmic structures underpin online pseudo-labeling: teacher-student frameworks with moving-average (momentum) updates, and fully online model self-training with dynamic sample selection or regret-minimization.

Teacher-Student / Momentum Frameworks

Here, an online (student) model is trained using pseudo-labels from a slower-moving teacher, which is either obtained via exponential moving average (EMA) or periodic synchronization. The prototypical update rule is:

$\theta_{\text{teacher}} \leftarrow m \cdot \theta_{\text{teacher}} + (1 - m) \cdot \theta_{\text{student}}$

where $m$ is the momentum parameter, typically $m \in [0.9, 0.999]$ (Higuchi et al., 2021, Van et al., 2022). This mechanism smooths the teacher's evolution, yielding more stable pseudo-labels. Pseudo-labels are then generated on-the-fly for current unlabeled samples and used to compute an unsupervised loss term.

Variants include:

Periodically updated teacher (rather than continuous EMA), enabling separation between pseudo-label generation and student learning, e.g., updating at local optima/minima of a cosine LR schedule (Tang et al., 2024).
Integration with domain adaptation: the teacher adapts concurrently with the student, supporting test-time adaptation via paired-view consistency or domain-specific-block updates (Yu et al., 2024).

Fully Online Self-Labeling and Regret Minimization

Some frameworks eschew a separate teacher and instead maintain an explicit online assignment of pseudo-labels, updated at each epoch via a decision-theoretic criterion (e.g., follow-the-perturbed-leader to minimize regret). For instance, in Learning from Label Proportions, the pseudo-label matrix for each bag is selected to minimize the cumulative unlikelihood cost plus a perturbation, under the constraint that pseudo-label proportions match the known bag proportions (Matsuo et al., 2023). This enables online adaptation and theoretical guarantees on the gap to optimal pseudo-label assignments.

3. Loss Functions and Adaptation Dynamics

Loss formulations in online pseudo-labeling fall into three broad categories:

Supervised Loss: Applied on labeled data, typically standard cross-entropy or task-specific objectives such as the Tversky loss for segmentation (Van et al., 2022).
Unsupervised/Pseudo-Label Loss: Enforces consistency to the current teacher's or model's pseudo-labels for unlabeled samples. Examples include:
- Cross-entropy to hard pseudo-labels.
- CTC loss in ASR for student outputs w.r.t. teacher-generated transcriptions (Higuchi et al., 2021, Berrebbi et al., 2022).
- Symmetric cross-entropy to teacher soft-labels under paired-view augmentations (Yu et al., 2024).
Contrastive and Consistency Regularization: Especially in domain adaptation, feature-level losses (supervised contrastive, KL divergence sharpening) facilitate discrimination even under label noise or category shift (Schlachter et al., 2024, Schlachter et al., 16 Apr 2025).

A key empirical finding is the trade-off between pseudo-label accuracy and quantity: contrastive and consistency-based losses are substantially more robust to label noise, while cross-entropy requires high-quality, high-confidence pseudo-labels to avoid collapse (Schlachter et al., 16 Apr 2025).

4. Specific Instantiations and Experimental Results

Momentum Pseudo-Labeling in ASR

The MPL algorithm in end-to-end ASR maintains dual online/offline (student/teacher) models, with the teacher generating pseudo-labels via greedy CTC decoding. The teacher updates by momentum averaging. Empirical results demonstrate up to 80% recovery of the supervised WER gap, consistent improvements over offline or iterative PL, and strong stability to domain mismatch (Higuchi et al., 2021, Higuchi et al., 2021).

Online Pseudo-Label Unified Object Detection

In large-scale multi-dataset object detection, periodic teacher updates at cosine schedule minima, together with category-specific regression and pseudo-label RPN heads, resolve the cross-dataset missing-annotation challenge. The method achieves SOTA mean mAP and improves RPN recall by 0.5% on COCO-Split5 (Tang et al., 2024).

Online Pseudo-Labeling with Regret Minimization

For learning from label proportions, an online FPL algorithm assigns pseudo-labels to match constraints and minimize cumulative regret. This approach yields stable classification accuracy for arbitrarily large bags, in contrast to competitors that degrade sharply with bag size (Matsuo et al., 2023).

Online Test-Time Adaptation

The DPLOT framework applies paired-view mean-teacher pseudo-labeling and entropy minimization only to shallow, domain-specific blocks, preventing model collapse and yielding SOTA results in online, continual TTA for vision benchmarks (Yu et al., 2024).

Online GMM-Driven Pseudo-Labeling for Universal Domain Adaptation

Memory-efficient online source-free domain adaptation is achieved by maintaining and updating class-conditional feature statistics in a GMM, enabling reliable entropy- and likelihood-based OOD detection and robust pseudo-labeling at $O(Kd)$ memory cost for $K$ classes and $d$ -dimensional features (Schlachter et al., 2024).

5. Stabilization, Quality Control, and Practical Guidelines

Practical online pseudo-labeling systems must implement mechanisms to manage pseudo-label drift, label noise, and evolving domain statistics:

History-based PL curation: Using token-level edit distance, curriculum-based cache update rules, or entropy thresholding to retain/replace pseudo-labels as their consistency with current predictions decreases (Berrebbi et al., 2022).
Selective High-Confidence Assignment: Empirically, selecting only high-confidence (low-entropy) pseudo-labels substantially outperforms maximizing quantity at the expense of quality (Schlachter et al., 16 Apr 2025).
Update Schedules: Synchronizing teacher updates with student optima via learning-rate scheduling, or tuning the EMA momentum parameter, balances rapid adaptation with stability (Tang et al., 2024, Van et al., 2022).
Loss Selection: When high-confidence pseudo-labels cannot be guaranteed, contrastive or consistency losses provide much greater robustness than cross-entropy (Schlachter et al., 16 Apr 2025).
Block-specific Updates: Restricting adaptation to domain-specific layers (e.g., via data-driven block selection) avoids catastrophic drift (Yu et al., 2024).

6. Comparative Table of Online Pseudo-Labeling Strategies

Setting / Algorithm	PL Generation	Teacher Update/Selection	Loss Function(s)
Momentum Pseudo-Labeling ASR (Higuchi et al., 2021, Higuchi et al., 2021)	Greedy CTC decode from EMA teacher model	Online EMA (momentum), or periodic	CTC + supervised, no LM needed
LLP by Regret Minimization (Matsuo et al., 2023)	Optimization over possible label assignments	FPL (Follow the Perturbed Leader)	Cross-entropy, combinatorial regret min.
Unified Obj. Detection (Tang et al., 2024)	Periodic teacher inference for unannotated objects	Parameter copy at cosine schedule minima	Binary CE, category-specific box, RPN losses
Online TTA (DPLOT) (Yu et al., 2024)	Mean-teacher EMA on paired views (flipping)	EMA at every batch; update only selected blocks	Entropy min. + symmetric CE consistency
Memory-Efficient SF-UniDA (Schlachter et al., 2024)	Argmax over current model, OOD gating by GMM	GMM params updated per batch	KL divergence to sharpened targets + contrastive

7. Theoretical Properties and Limitations

Several algorithms provide formal guarantees (notably regret-based approaches), ensuring that the online pseudo-label selection is competitive on average with the best assignment in hindsight. Notably, in LLP with online FPL, the expected per-epoch regret decreases asymptotically to zero as $O(|B^i|\sqrt{T\ln |\mathcal Y^i|})$ (Matsuo et al., 2023).

Other methods highlight specific stability or failure modes:

MPL and DPLOT stability is sensitive to the EMA momentum/debugging of normalization statistics, especially in low-resource or non-i.i.d. settings (Higuchi et al., 2021, Yu et al., 2024).
Cross-entropy adaptation in domain transfer is highly brittle to pseudo-label noise; contrastive losses offer more tolerance (Schlachter et al., 16 Apr 2025).
In object detection, periodic (rather than EMA) teacher updates ensure that pseudo-labels are generated using a well-optimized student, maximizing their reliability for later self-training (Tang et al., 2024).

Unsolved challenges include further improving pseudo-label quality under large domain and category shift, integrating sample-level confidence estimates, and handling catastrophic drift in fully streaming settings.

References:

(Higuchi et al., 2021, Higuchi et al., 2021, Van et al., 2022, Berrebbi et al., 2022, Matsuo et al., 2023, Yu et al., 2024, Schlachter et al., 2024, Tang et al., 2024, Schlachter et al., 16 Apr 2025)