SemiGDA: Generative Dual-distribution Alignment for Semi-Supervised Medical Image Segmentation

Published 25 Apr 2026 in cs.CV | (2604.23274v1)

Abstract: Semi-supervised learning addresses label scarcity and high annotation costs in medical image segmentation by exploiting the latent information in unlabeled data to enhance model performance. Traditional discriminative segmentation relies on segmentation masks, neglecting feature-level distribution constraints. This limits robust semantic representation learning and adaptive modeling of unlabeled data in scenarios with few labels. To address these limitations, we propose SemiGDA, a novel Generative Dual-distribution Alignment framework for semi-supervised medical image segmentation. Our SemiGDA overcomes the reliance of discriminative methods on large labeled datasets by aligning feature and semantic distributions to boost semantic learning and scene adaptability. Specifically, we propose a Dual-distribution Alignment Module (DAM), which employs two structurally distinct encoders to model image and mask feature distributions. It enforces their alignment in the latent space via distributional constraints, establishing structured feature consistency. Moreover, we design a Consistency-Driven Skip Adapter (CDSA) strategy, which introduces dual skip adapters (Image and Mask) to fuse multi-scale features via skip connections. Using a consistency loss, CDSA enhances cross-branch semantic alignment and reinforces fine-grained semantic consistency. Experimental results on diverse medical datasets show that our method outperforms other state-of-the-art semi-supervised segmentation methods. Code is released at: https://github.com/taozh2017/SemiGDA.

Abstract PDF Upgrade to Chat

Authors (5)

Summary

The paper introduces a generative dual-encoder framework that aligns Gaussian prior distributions for images and masks to enhance segmentation performance.
It integrates a Dual-distribution Alignment Module and Consistency-Driven Skip Adapter to achieve notable Dice score improvements across multiple medical benchmarks.
Empirical results show that SemiGDA is robust under sparse labeling, outperforming SOTA methods on colonoscopy, skin lesion, and ultrasound datasets.

SemiGDA: Generative Dual-distribution Alignment for Semi-supervised Medical Image Segmentation

Motivation and Context

Medical image segmentation requires substantial labeled data, which is inherently expensive and labor-intensive to obtain in clinical domains. While semi-supervised learning (SSL) mitigates annotation scarcity by leveraging unlabeled samples, existing SSL segmentation strategies remain predominantly discriminative and focus on per-pixel classification or basic consistency losses. However, these paradigms often yield overfitting and inadequate generalizability, especially under low-label regimes, due to limited exploitation of high-level feature distributions and semantic priors.

Methodological Contributions

The proposed "SemiGDA: Generative Dual-distribution Alignment for Semi-Supervised Medical Image Segmentation" (2604.23274) introduces a generative paradigm for SSL segmentation, directly addressing the limitations of discriminative models under annotation constraints. The core methodological innovations are:

Generative Dual-encoder Framework: SemiGDA employs two structurally distinct encoders—a frozen VAE-based encoder to extract prior feature distributions for images and masks, and a trainable encoder that learns discriminative details. The design decouples generic feature representation from task-specific discriminative information.
Dual-distribution Alignment Module (DAM): Through a latent mapping model, DAM explicitly enforces alignment between the latent feature space of images and masks. Gaussian prior distributions for both images and masks are modeled, and their alignment is regularized by mean-square error on latent representations. This alignment fosters robust global semantic modeling, crucial for segmentation in sparse supervision regimes.
Consistency-Driven Skip Adapter (CDSA): This decoder-side design integrates skip connections from both encoders at multiple scales, employing lightweight adapters for image and mask features. CDSA is regularized with a Dice loss on both labeled and unlabeled data, enforcing semantic consistency at multiple spatial resolutions across the generative branches.
Annotation Conversion and Reversion (ACR): To ensure architectural compatibility between target mask formats and generative model input, ACR normalizes mask targets for VAE consumption and reverses this transformation after decoding, preserving semantic integrity.
Unified Objective: Supervised and unsupervised components are combined, with latent distribution alignment and segmentation losses weighted by a Gaussian warm-up schedule, enabling staged curriculum learning with increasing enforcement of unsupervised constraints.

Empirical Results

Quantitative experiments were conducted on four diverse medical image segmentation benchmarks: CVC-300, CVC-ClinicDB, Kvasir, ISIC-2018, BCSS, and BUSI. SemiGDA consistently outperformed 11 competitive state-of-the-art (SOTA) semi-supervised segmentation models across all tested labeling ratios (10%, 30%).

Colonoscopy and ISIC-2018: On Kvasir with 10% labeled data, SemiGDA achieved an 83.03% Dice score, representing a +1.84% improvement over the best prior method and up to +8.89% over weaker recent SOTA. For ISIC-2018, it outperformed the most competitive baseline by +0.53% Dice (from 85.75% to 86.28%) and reduced 95HD from 4.81 to 4.70.
BCSS and BUSI: Marked improvements were seen in both Dice and IoU. On BCSS (10% labeled), SemiGDA achieved 74.05% Dice (+2.10% over CSCPA), with further improvement at higher labeling ratios. On BUSI with 10% labels, an especially challenging ultrasound dataset, it achieved 75.57% Dice, significantly higher than previous best approaches.

Ablation studies demonstrated that both the DAM and CDSA components independently contribute substantial performance gains (e.g., Dice improvements on BUSI from 70.48% baseline to 75.57% with all components active). The inclusion of unsupervised losses, particularly those operating in the latent feature domain, was shown to notably enhance segmentation outcomes under low supervision. Notably, SemiGDA excelled even at extremely low label rates (1%), highlighting its robustness in low-resource scenarios.

Theoretical and Practical Implications

By shifting from a per-pixel discriminative focus to explicit dual-distribution alignment in a generative framework, SemiGDA operationalizes a more semantically consistent and robust approach to semi-supervised medical image segmentation. Modeling and regularizing latent prior distributions of both images and masks introduces feature-level constraints that are more informative than mask-level supervision alone, especially when labeled data are sparse.

Practically, SemiGDA's architecture—incorporating a generative backbone pretrained on large-scale natural images—demonstrates substantial zero-shot generalizability, which can be further fine-tuned for domain adaptation. The use of dual skip adapters means that multi-scale spatial dependencies are maintained and aligned for both unlabeled and labeled data, leading to sharper boundaries and improved structural integrity even in challenging clinical datasets.

Theoretically, the design contributes to SSL segmentation by tightly coupling the generative paradigm with discriminative objectives, overcoming limitations of existing adversarial or pseudo-label methods that either suffer from training instability or error accumulation.

Future Directions

SemiGDA demonstrates the promise of generative-discriminative hybrids for medical image segmentation. Future research may explore:

Extending dual-distribution alignment to 3D medical imaging modalities.
Integrating text or multimodal priors, allowing for open-vocabulary or prompt-driven segmentation in the SSL setting.
Adapting DAM and CDSA principles to transformer-based backbones and diffusion-based generative models for richer latent semantics and fine-grained detail synthesis.
Algorithmic optimization for computational scalability with larger image sizes and increased architectural complexity.

Conclusion

SemiGDA (2604.23274) introduces a generative dual-distribution alignment strategy for semi-supervised medical image segmentation, coupling a dual-encoder architecture with explicit latent distribution alignment and multi-scale semantic consistency. It achieves superior performance across multiple datasets and labeling regimes, providing both practical improvements for low-label medical applications and theoretical advances in the integration of generative models within SSL segmentation frameworks.

Markdown Report Issue