- The paper proposes QG-SSL, integrating an independent quality predictor to guide regularization and pseudolabel weighting for improved segmentation.
- It employs synthetic mask degradations to train a ResNet-18 based predictor achieving high correlation (ρ > 0.92) and low MAE (0.043–0.088) across benchmarks.
- Experimental results reveal consistent DSC improvements over standard SSL methods, demonstrating the framework's architecture-agnostic advantages.
Quality-Guided Semi-Supervised Learning for Medical Image Segmentation
Introduction and Rationale
Medical image segmentation is a central problem in clinical analysis pipelines, necessitating robust deep learning models trained on extensive pixelwise annotations. However, large-scale annotation in medical datasets is prohibitively labor-intensive and expensive. Semi-supervised learning (SSL) approaches mitigate this by exploiting both labeled and, crucially, abundant unlabeled datasets. Classic SSL methods typically leverage consistency regularization, pseudolabel selection based on prediction confidence, or contrastive representation objectives. The foundational limitation in standard pseudolabeling protocols lies in their reliance on self-referential model confidence or uncertainty estimates that may reinforce systematic errors inherent to the original model.
This work proposes a quality-guided SSL (QG-SSL) paradigm wherein a dedicated, contextually-grounded quality prediction network is integrated into the SSL training loop. This network predicts segmentation quality directly from image-mask pairs, independently of the segmentation model’s own inference. The approach is instantiated via two main mechanisms: (1) differentiable quality-aware regularization (QAR), and (2) quality-based pseudolabel sample reweighting (PL-QW). These mechanisms are designed as drop-in enhancements, agnostic to the underlying segmentation architecture or SSL protocol, facilitating broad adoption.
Figure 1: Framework overview summarizing the two-phase QG-SSL methodology and the diverse experimental configurations.
Methodology
Generation of Variable-Quality Masks and Quality Prediction
To train the segmentation quality predictor, synthetic datasets of degraded masks are constructed by applying complex stochastic corruptions—encompassing morphological, geometric, and boundary perturbations—alongside realistic degradation sampling using predictions from partially-trained (weak) networks. The ground truth mask and its corrupted variants are associated with continuous Dice similarity scores, thus providing explicit regression targets for quality estimation.
The quality predictor gϕ (operationalized as a ResNet-18 backbone with regression head) is trained to map an image and degraded mask pair to a Dice-like quality estimate. Inclusion of weakly-trained network outputs is critical to close the distributional gap between synthetic and typical neural error patterns observed during network training, ensuring the model’s generalization to real-world pseudo-segmentations.
Integration into Semi-Supervised Segmentation Frameworks
The QG-SSL framework delivers two complementary integration strategies:
- Quality-Aware Regularization (QAR): Unlabeled sample predictions are routed through the frozen quality predictor, with the regularization loss constructed as L=∑(1−gϕ(xju,fθ(xju))). Backpropagation occurs through both the segmentation model and the quality estimator, driving the segmentation model toward outputs that the independent quality network deems high-fidelity.
- Quality-Weighted Pseudolabels (PL-QW): Standard pseudolabeling processes are retained, but the per-sample loss is weighted by the predicted segmentation quality, effectively diminishing the influence of unreliable pseudo-annotations. This mechanism is fully compatible with any pseudolabel-generating SSL algorithm (e.g., mean-teacher, cross-pseudo supervision, interpolation consistency, contrastive learning), requiring no modifications to segmentation network or SSL pipeline.
Experimental Validation
Datasets and Architectures
The proposed approach was validated across five benchmark medical segmentation datasets (PH2, SCD, DMF, COL, CLI) spanning dermatology and colonoscopy, leveraging distinct unlabeled sources (e.g., ISIC2020, Polyp-Box-Seg). Experiments covered a range of architectures for the segmentation backbone (fθ): U-Net++, Attention U-Net, and Swin-U-Net, representing both convolutional and transformer-based designs.
The quality predictor attains mean absolute error (MAE) between [0.043,0.088] and Pearson’s correlation coefficients ρ>0.92 across all datasets, validating the regression network’s ability to estimate relative segmentation quality in a context-aware fashion rather than by memorizing mask statistics.
Figure 2: Dice score versus predicted quality estimate on the CLI test set reveals strong linear correlation (Pearson's ρ=0.69). Inset: annotated visual examples demonstrating the correspondence in ground truth and predicted qualities for exemplary (B, C) and failure (A, D) cases.
Extensive cross-architecture and cross-dataset benchmarks demonstrate that both QAR and PL-QW yield consistent, statistically significant improvements over all standard SSL methods, including strong student-teacher (MT, UA-MT), cross-pseudo supervision (CPS), and contrastive learning baselines. Notably:
- On the PH2 dataset with U-Net++, QAR achieves 95.48±0.48% DSC, surpassing baseline SSL (e.g., 94.20±0.71% DSC for CPS, 94.33±0.62% DSC for UA-MT).
- The sample reweighting via PL-QW obtains comparable gains, and its plug-and-play nature means any SSL framework producing pseudolabels can immediately benefit from independent quality supervision.
Ablation studies reveal that inclusion of weak-model degradations during quality predictor training, optimal range for the number of synthetic masks per sample, and moderate regularization weights are all critical for maximal performance. Increasing the scale of the unlabeled pool mainly accelerates convergence rather than final accuracy.
Practical and Theoretical Implications
This QG-SSL framework systematically decouples the estimation of segmentation quality from the representations of the main model, robustly addressing miscalibration and overconfidence in medical segmentation networks—a chronic concern highlighted in prior literature. The method is notably architecture-agnostic, requiring neither modification nor end-to-end coupled training with the primary segmentation pipeline, in contrast to previous approaches.
The general applicability opens several avenues:
- Active Learning: The independently trained quality predictor could drive sample selection, prioritizing regions of the data manifold where pseudo-segmentations exhibit low predicted quality.
- Multi-Class and Multi-Label Extensions: While the present instantiation targets binary segmentation, extension to complex multi-class scenarios is direct.
- Orthogonal Combination with Architectural Innovations: Since this quality-weighted supervision is modular, it can be amalgamated with architectural advances (e.g., transformers, self-training with knowledge distillation) to further surpass existing SOTA.
Conclusion
The QG-SSL paradigm establishes the utility of independent, context-sensitive quality prediction for maximizing the efficacy of SSL in medical image segmentation. The empirical improvements, observed across diverse data regimes and segmentation architectures, demonstrate that explicit estimation of segmentation quality yields superior utilization of unlabeled data relative to model-centric confidence regularization. This work positions quality-guided supervision as a foundational principle for future SSL, active learning, and hybrid clinical AI systems.