Pseudo-Label Unmixing (PLU) in Instance Segmentation
- Pseudo-Label Unmixing (PLU) is a method that detects and decomposes merged pseudo-labels in densely overlapping instances.
- It extends Mask R-CNN with an OverlapJudge head and a decomposition branch to correct label noise in semi-supervised learning.
- PLU achieves near fully supervised accuracy on organoid microscopy data using only 10% of labeled examples, demonstrating scalable label efficiency.
Pseudo-Label Unmixing (PLU) is a targeted framework for overcoming label noise in semi-supervised instance segmentation, specifically addressing the widespread problem of pseudo-label mergers in images containing densely overlapping objects. Primarily developed for organoid microscopy data, where overlapping instances often confound instance-level segmentation, PLU introduces a two-stage solution: explicit detection of merged pseudo-labels and their subsequent decomposition into constituent object masks. Integrated within a Synthesis-Assisted Semi-Supervised Learning (SA-SSL) paradigm, PLU leverages corrective unmixing to generate high-fidelity supervision on both real and synthetic data, attaining near fully supervised performance using a fraction of labeled examples (Huang et al., 10 Jan 2026).
1. Problem Formulation and Notation
Let the image domain be , with ground-truth instance masks , where is a set of binary masks for objects. A small labeled set and large unlabeled set , () are used. A teacher network yields per-instance mask probabilities with . Pseudo-label masks are obtained by thresholding , using ; boxes are retained if confidence . The instance overlap ratio,
defines “severely overlapping” masks for , with . Standard pseudo-labeling frequently merges two overlapped instances into a single noisy mask. PLU's explicit objective is to detect these erroneous labels and recover the correct instance decomposition.
2. Detection of Erroneous Pseudo-Labels
PLU augments the Mask R-CNN architecture with an Overlap-Judgement ("OverlapJudge") head. For every region of interest (RoI), features are extracted to predict a scalar confidence that the mask is “correct,” i.e., unmerged. Supervision is administered with a binary target , established by comparing intersection-over-union (IoU) against true single-instance and merged-instance ground truth: The loss is a standard binary cross-entropy: At inference, any RoI with is flagged for unmixing.
3. Instance Decomposition (“Unmixing”)
For every flagged RoI, the corresponding feature map is processed by a decomposition branch that predicts up to possible instance masks , plus a confidence vector for sub-instance existence.
- Instance count loss: Each predicts , with binary cross-entropy supervision against true sub-instance count . For maximum splits:
- Mask alignment loss: Predicted sub-masks are optimally matched to ground-truth via the Hungarian algorithm, minimizing :
where is the true number of sub-instances, and is the optimal assignment.
The final step replaces the single merged mask by the set .
4. PLU Losses and Optimization
The total loss used in Mask R-CNN with PLU for fully supervised training is: where
- : focal loss for classification,
- : Smooth-L1 bounding box regression,
- : mask pixel-wise cross-entropy,
- are typically set to 1 for balancing.
In semi-supervised learning with SA-SSL: with losses computed identically across real, pseudo-labeled, and synthetic images, leveraging the PLU correction for each flagged RoI.
5. Semi-Supervised Training Pipeline and Integration
The training algorithm proceeds as follows:
- Initialization: Train teacher on with .
- Pseudo-label generation: For , obtain detections via , binarize at , retain detections with confidence .
- PLU Correction: For each RoI, compute using OverlapJudge; if , apply decomposition, replacing with decomposed masks for all where .
- Image Synthesis: Convert corrected masks to contours and, optionally, apply instance-level augmentations (scale , rotation , shift px). Generate synthetic images with generator (pix2pixHD).
- Student update: Each mini-batch comprises 4 labeled, 2 unlabeled, and several synthetic images. The student is updated using .
- Teacher update: EMA() after each iteration.
- Repeat until convergence.
6. Synthesis, Augmentation, and Diversity Control
After PLU correction, high-fidelity pseudo-labels are used for both real and synthetic data. For synthesis, masks are transformed into binary contour representations, input into a pix2pixHD GAN generator trained using
Instance-level augmentations prior to synthesis increase sample diversity. Distributional alignment between real and synthetic domains is monitored via Fréchet Inception Distance (FID): Empirical results show that moderate augmentation, particularly scaling, achieves optimal trade-offs between diversity and FID.
7. Key Hyperparameters and Implementation Guidelines
<details> <summary>Table: Core Hyperparameters in PLU</summary>
| Component | Value(s) | Purpose/Scope |
|---|---|---|
| Box confidence threshold | Filter low-confidence masks | |
| Pixel threshold | Binarize mask logits | |
| Overlap ratio threshold | Severe overlap detection | |
| Max sub-instances (decomposition) | Limit for predicted instance splits | |
| IA ranges (shift, rotation, scale) | px, , | Stochastic augmentation |
| Backbone | ResNet-50 FPN | Detection/segmentation architecture |
| Optimizer | SGD (momentum 0.9) | All stages |
| Initial learning rate | 0.001 | All stages |
| Iterations, decay | 180k ( @ 80%, 90%) | Full training schedule |
| Batch composition | 4 labeled / 2 unlabeled | Semi-supervised learning |
| Synthesis model | pix2pixHD | Synthetic image generation |
| (GAN weighting) | Dataset-dependent | Feature matching in GAN loss |
</details>
To implement PLU, Mask R-CNN should be extended with OverlapJudge and decomposition heads, using the losses , , and , as above. Hyperparameters and augmentation regimes should mirror those listed.
8. Empirical Results and Impact
PLU yields segmentation accuracy on par with fully supervised models while utilizing only 10% labeled data. It substantially improves detection and separation of overlapping instances, validated through rigorous ablation across two organoid datasets. The method demonstrates that addressing instance label error at both pseudo-label and synthesis stages enables scalable, label-efficient analysis suitable for high-throughput biomedical imaging workflows (Huang et al., 10 Jan 2026).
A plausible implication is that PLU may generalize to other domains where instance overlap corrupts pseudo-label accuracy, not limited to biomedical imagery. Its modular design allows seamless integration into SA-SSL frameworks with minimal architectural changes and direct benefit for synthetic training pipelines.