Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 73 tok/s
Gemini 2.5 Pro 42 tok/s Pro
GPT-5 Medium 26 tok/s Pro
GPT-5 High 34 tok/s Pro
GPT-4o 96 tok/s Pro
Kimi K2 191 tok/s Pro
GPT OSS 120B 454 tok/s Pro
Claude Sonnet 4.5 36 tok/s Pro
2000 character limit reached

Generative AI Enables Medical Image Segmentation in Ultra Low-Data Regimes (2408.17421v1)

Published 30 Aug 2024 in eess.IV and cs.CV

Abstract: Semantic segmentation of medical images is pivotal in applications like disease diagnosis and treatment planning. While deep learning has excelled in automating this task, a major hurdle is the need for numerous annotated segmentation masks, which are resource-intensive to produce due to the required expertise and time. This scenario often leads to ultra low-data regimes, where annotated images are extremely limited, posing significant challenges for the generalization of conventional deep learning methods on test images. To address this, we introduce a generative deep learning framework, which uniquely generates high-quality paired segmentation masks and medical images, serving as auxiliary data for training robust models in data-scarce environments. Unlike traditional generative models that treat data generation and segmentation model training as separate processes, our method employs multi-level optimization for end-to-end data generation. This approach allows segmentation performance to directly influence the data generation process, ensuring that the generated data is specifically tailored to enhance the performance of the segmentation model. Our method demonstrated strong generalization performance across 9 diverse medical image segmentation tasks and on 16 datasets, in ultra-low data regimes, spanning various diseases, organs, and imaging modalities. When applied to various segmentation models, it achieved performance improvements of 10-20\% (absolute), in both same-domain and out-of-domain scenarios. Notably, it requires 8 to 20 times less training data than existing methods to achieve comparable results. This advancement significantly improves the feasibility and cost-effectiveness of applying deep learning in medical imaging, particularly in scenarios with limited data availability.

Citations (1)

Summary

  • The paper’s main contribution demonstrates that end-to-end joint optimization of a conditional GAN with segmentation models yields up to 20% performance gains in ultra low-data settings.
  • The approach integrates data synthesis and segmentation training via multi-level optimization, reducing required annotated samples by 8-20 times compared to traditional methods.
  • Empirical evaluations across diverse medical imaging tasks show robust, backbone-agnostic improvements without relying on external unlabeled data.

GenSeg: End-to-End Generative Data Synthesis for Medical Image Segmentation with Limited Labeled Data

The paper "Generative AI Enables Medical Image Segmentation in Ultra Low-Data Regimes" (2408.17421) introduces GenSeg, a generative deep learning framework intended to address medical image segmentation tasks when only a very limited number of annotated samples are available. Contrary to conventional data augmentation or semi-supervised segmentation approaches, GenSeg formalizes data synthesis and segmentation model training as a single, end-to-end multi-level optimization (MLO) problem. The generative model’s architecture, implemented via conditional GANs with differentiable architecture search, is optimized using the downstream segmentation validation loss, providing explicit performance-oriented feedback for the synthetic data generation process.

Background and Motivation

Medical semantic segmentation models generally require substantial quantities of expertly-labeled images, a requirement that is impractical across many settings due to both annotation complexity (per-pixel masks) and regulatory or data availability constraints. Classical data augmentation and semi-supervised learning methods have limitations: the former treats augmentation and segmentation independently, often leading to marginal utility in ultra low-sample regimes, while the latter presumes access to corpora of unlabeled data, which are often not available due to privacy or IRB restrictions.

GenSeg directly addresses this by:

  • Generating high-fidelity, paired mask-image data that is optimized for its utility to segmentation models.
  • Eliminating the dependency on external unlabeled images.
  • Integrating generative model learning (mask-to-image) and segmentation model training into an end-to-end framework, such that segmentation performance directly influences the generative process via MLO.

Methodology

Architecture

GenSeg consists of two core modules:

  1. Data Generation Model — A conditional GAN (Pix2Pix backbone with learnable architecture) mapping augmented masks to medical images. The architecture of this generator is optimized through differentiable architecture search (DARTS-like methodology), allowing search over operator types (convolution, kernel sizes, up-convolutions, etc.).
  2. Segmentation Model — Any standard segmentation model (e.g., UNet, DeepLab, SwinUnet).

Data Generation Pipeline

  • Reverse Generation: Starting from real, expert-annotated masks, perform domain-appropriate mask augmentations (rotation, flipping, translation), and generate corresponding synthetic images via the GAN.
  • Joint Training (MLO):
    • Stage I: Fix architecture parameters, optimize GAN weights (G,HG, H) via adversarial loss on real mask-image pairs.
    • Stage II: Use the generator to produce augmented image-mask pairs which, together with the original data, are used to update the segmentation model via segmentation loss.
    • Stage III: Evaluate segmentation on real validation data; the validation loss is then used to update the generator’s architecture parameters via gradient descent.
  • This process is iterated, such that segmentation utility feedback flows into the generative model architecture and weight updates. One-step approximations of gradient updates are employed to efficiently backpropagate validation loss through generator parameters to architecture-choice variables.

Implementation Details

  • Generator: Pix2Pix conditional GAN with differentiable architecture search enabled for both encoding and decoding blocks.
  • Architecture search: Weights αi,k\alpha_{i, k} select among candidate operators in each cell. The final architecture is composed by retaining operators with maximal α\alpha values.
  • Losses: Cross-entropy for segmentation and adversarial losses; trade-off hyperparameter γ\gamma balances real and synthetic data contributions.
  • Optimizers: Adam and RMSprop, standard weight decays and learning rates; best validation performance snapshot adopted for model selection.
  • Experiments: Training performed using A100 GPUs, with each experimental configuration repeated three times for performance reporting.

Empirical Evaluation

Datasets and Settings

Segmentation was evaluated on 9 tasks from 16 public datasets, including a wide variety of organs, diseases, and imaging modalities: skin lesion segmentation (ISIC, PH2, DermIS, DermQuest), lung segmentation (JSRT, NLM-MC, NLM-SZ, COVID-QU-Ex), breast ultrasound, placental vessel, polyp, foot ulcer, intraretinal cystoid fluid, left ventricle, and myocardial wall segmentation. All settings focused on ultra low-data regimes (between 8 and 100 labeled images).

Main Results

  • Absolute Performance Improvement: For standard segmentation models with minimal training data, GenSeg consistently delivered 10-20% absolute performance gains (Dice/Jaccard metrics) in both in-domain and out-of-domain scenarios. Example: With only 40–50 samples, GenSeg-DeepLab outperformed DeepLab by 20.6% (placental vessels), 14.5% (skin lesions), 11.3% (IC fluid), etc.
  • Sample Efficiency: 8-20-fold reductions in required annotated samples were achieved to match baseline model performance. For instance, DeepLab needed 500 placental vessel images to reach a Dice of 0.51, compared to GenSeg-DeepLab requiring only 50 examples for the same performance.
  • Out-of-Domain Robustness: GenSeg maintained superior performance with minimal supervised data in cross-domain settings. For example, GenSeg-UNet achieved a Jaccard index of 0.65/0.77 on DermIS/PH2 vs. UNet’s 0.41/0.56 when trained on 40 ISIC images.
  • Backbone Agnosticism: Substantial improvements were observed not only on UNet and DeepLab but also with transformer-based SwinUnet.

Ablation and Baseline Comparisons

  • Versus Traditional Augmentation: GenSeg consistently outperformed rotation, flipping, translation, their compositions, and GAN-based WGAN augmentation. In skin lesion segmentation on PH2, GenSeg with 40 ISIC training images outperformed the best baseline (Flip) by 9% absolute Dice score.
  • Versus Semi-Supervised Methods: GenSeg exceeded performance of CTBCT, DCT, and MCF, even when these baselines absorbed 1000 external unlabeled images. GenSeg was superior despite using no additional unlabeled data.
  • End-to-End Benefit: Formal separation of the generative and segmentation model (no joint optimization) led to significantly worse results: e.g., on placental vessel segmentation, GenSeg-DeepLab’s in-domain Dice score exceeded the "Separate" baseline by 10%.
  • Model Diversity & Search: Incorporating mask-to-image generators with learnable architectures (Pix2Pix, SPADE) further improved synthetic data quality over ASAPNet variants; multi-operation augmentation (rotation, translation, flipping) delivered better generalization, especially OOD.
  • Computational Cost: Designed for low data availability, total training time per model was under 2 GPU-hours (A100), with no increase in segmentation model inference cost.

Theoretical and Practical Implications

The GenSeg architecture emphasizes several theoretical strengths:

  • Performance-Driven Synthetic Data: By aligning data generation targets with downstream segmentation model performance, GenSeg eliminates the waste associated with generic data augmentation techniques that ignore downstream utility.
  • Integrated Architecture Search: Differentiable search within the GAN generator allows dynamic model adaptation for heterogeneous anatomical and imaging distributions, improving mask-image plausibility and task-specific sample efficiency.
  • Elimination of Unlabeled Data Dependency: By requiring only a small set of annotated examples, GenSeg makes deep segmentation feasible in environments with severe data-sharing or annotation constraints.

Practically, GenSeg lowers the resource and time barriers to deploying high-fidelity image segmentation in clinical and biomedical environments, where acquiring a few dozen expert-annotated samples is realistic, but large corpus curation remains infeasible due to privacy and logistical hurdles.

Limitations and Future Directions

While GenSeg demonstrates clear improvements in ultra low-data regimes within medical imaging, several limitations invite future work:

  • Scalability to very high-resolution volumetric or 3D images is not established.
  • The quality of generated images is tightly coupled to the diversity and representativeness of input masks and the search space of the generative model architecture.
  • Extension beyond segmentation (e.g., classification or detection) or to non-medical domains is not explored but is likely feasible.

Further, research in improving scalability, integrating diffusion-based generative models, and combining with federated learning for privacy-preserving distributed training could be fruitful. Optimization of multi-level and meta-learning algorithms for even more rapid adaptation and joint optimization in non-stationary clinical environments remains an open research area.

Conclusion

GenSeg rigorously demonstrates that generative AI, when directly optimized for downstream segmentation efficacy via end-to-end multi-level learning, dramatically enhances model sample efficiency, performance, and robustness across diverse medical imaging modalities in extreme low-data settings. The explicit feedback from the segmentation objective to the data generator marks a significant methodological advance over previous augmentation and semi-supervised approaches, establishing a new state-of-the-art for annotation-efficient medical image segmentation. The open-source implementation further supports practical adoption in research and deployment settings.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube