Papers
Topics
Authors
Recent
Search
2000 character limit reached

RadImageGAN: Multi-Modal Synthetic Imaging

Updated 16 February 2026
  • RadImageGAN is a multi-modal generative adversarial network that synthesizes high-resolution CT, MRI, and endoscopy images using large-scale datasets and class-conditioning.
  • It employs progressive resizing, style modulation, and conditional training to achieve high image fidelity and robust anatomical/pathological diversity.
  • Integration with BigDatasetGAN enables pixel-wise segmentation, enhancing data augmentation and transfer learning for complex medical imaging tasks.

RadImageGAN is a multi-modal, dataset-scale generative adversarial network (GAN) designed to synthesize high-resolution, multi-class medical images spanning computed tomography (CT), magnetic resonance imaging (MRI), and endoscopy. Trained on the large-scale RadImageNet database and extended with BigDatasetGAN for pixel-wise segmentation annotations, RadImageGAN enables the generation of synthetic labeled datasets across 12 anatomical regions and 130 pathological classes. The system is engineered to address the scarcity and annotation cost issues inherent to medical imaging datasets by supporting diverse data augmentation and transfer learning for downstream segmentation tasks (Liu et al., 2023).

1. Dataset Foundation and Preprocessing

RadImageGAN utilizes subsets of RadImageNet and HyperKvasir for training two generator instances:

  • RadImageGAN-CT/MR: Trained on 880,314 axial CT and MRI 2D images from 102,774 unique patients, covering 124 radiologic pathological classes across multiple anatomical regions (abdomen, brain, spine, knee, etc.).
  • RadImageGAN-Gastro: Trained on 5,713 colonoscopy images from HyperKvasir, modeling six gastrointestinal pathological classes.

All three modalities—CT, MRI, and endoscopy—are addressed. Preprocessing includes progressive resizing across six resolutions (16×16 to 512×512 px), leveraging prior stage weights for stability. X-axis flips are applied at all but the final stage. Class-conditional labels from RadImageNet guide the generation process, enabling control over anatomical and pathological content.

2. Model Architecture and Loss Functions

RadImageGAN builds on the StyleGAN-XL architecture, extending the latent space, diversity, and multi-class handling of StyleGAN3. The key components include:

  • Generator (GG): Latent vectors zN(0,I)z \sim \mathcal{N}(0, I) are mapped to style vectors ww, which modulate convolutional blocks through affine transformations. Progressive growing introduces new "head" layers at each upsampled resolution, culminating in 33 layers at 512×512 resolution.
  • Discriminator (DD): A mirror architecture that downsamples images, with class-conditional guidance for real/fake prediction.

Major architectural features:

  • Style modulation: Affine-transformed ww modulates convolutional weights, enabling semantic control.
  • Class-conditioning: Categorical embeddings of pathology/anatomy condition both GG and DD.
  • Progressive growing: Incremental blending of new layers stabilizes training at increasing resolutions.

Loss functions:

  • Adversarial loss:

Ladv=Expdata[logD(x)]+Ezpz[log(1D(G(z)))]L_{\text{adv}} = \mathbb{E}_{x \sim p_{\text{data}}}[-\log D(x)] + \mathbb{E}_{z \sim p_z}[-\log (1-D(G(z)))]

  • R1 regularization:

LR1=γ2Expdata[xD(x)2]L_{R1} = \frac{\gamma}{2}\mathbb{E}_{x \sim p_{\text{data}}}[\|\nabla_x D(x)\|^2]

  • Path-length regularization:

Lpl=Ez,ϵN(0,I)[(wG(w)ϵa)2]L_{pl} = \mathbb{E}_{z, \epsilon \sim \mathcal{N}(0,I)}\left[(\|\nabla_w G(w) \cdot \epsilon\| - a)^2\right]

with ww from the mapping network, ϵ\epsilon random noise, and aa an EMA of path lengths.

Final objectives:

  • LG=Ez[logD(G(z))]+λplLplL_G = \mathbb{E}_z[-\log D(G(z))] + \lambda_{pl}L_{pl}
  • LD=Ex[logD(x)]+Ez[log(1D(G(z)))]+λR1LR1L_D = \mathbb{E}_x[-\log D(x)] + \mathbb{E}_z[-\log(1-D(G(z)))] + \lambda_{R1}L_{R1}

3. Training Protocols and Computational Resources

Training utilized high-end GPU clusters:

  • RadImageGAN-CT/MR: Eight NVIDIA DGX-A100 (640 GB VRAM total), 4,563 GPU-hours.
  • RadImageGAN-Gastro: Eight DGX-V100 (256 GB VRAM total), 3,088 GPU-hours.

Batch sizes: 2,048 (16×16, 32×32), 256 (64×64+). Learning rates followed StyleGAN-XL defaults (2×1032\times 10^{-3} for both GG and DD, with linear decay at final stages). At each upscale, prior model weights were loaded and new head layers appended (seven per stage, five in the final stage).

4. Synthetic Image Fidelity and Diversity

RadImageGAN produces 512×512 px images across all supported classes and modalities. Output quality is characterized by Fréchet Inception Distance (FID), measured at each resolution.

Model 16×16 32×32 64×64 128×128 256×256 512×512
RadImageGAN-CT/MR 6.40 7.75 7.60 8.85 9.23 9.37
RadImageGAN-Gastro 8.08 9.12 11.36 8.74 5.50 5.05

Qualitative assessments indicate high realism in anatomical and pathological features, including contrast and lesion morphology for CT abdomen, MRI spine, and colonoscopy polyp classes.

5. Segmentation Annotation via BigDatasetGAN

Pixel-wise labeled data is achieved by integrating BigDatasetGAN. This approach attaches a lightweight "feature interpreter" segmentation head to the generator’s intermediate feature maps:

  1. Synthetic image set generation: N=50N=50 images per downstream task.
  2. Expert annotation: Manual masks labeling each class.
  3. Interpreter training: Generator weights frozen; interpreter SS trained on feature maps ϕ(x)\phi(x) to output class logits pc(i)p_c(i) via softmax.
  4. Segmentation loss: Multi-class cross-entropy,

Lseg=i=1Nc=1Cyi,clogpi,cL_{\text{seg}} = -\sum_{i=1}^{N}\sum_{c=1}^{C} y_{i,c} \log p_{i,c}

with yi,cy_{i,c} ground truth pixel labels.

Training uses a learning rate of 1×1041\times 10^{-4}, batch size of 4, and 100 epochs. This yields RadImageGAN-Labeled, capable of generating synthetic images with corresponding pixel-wise masks across multiple classes with minimal manual annotation.

6. Impact in Downstream Segmentation

RadImageGAN-Labeled was evaluated on four public segmentation datasets (BTCV-Abdomen, CHAOS-MRI, Labeled Lumbar Spine MRI, CVC-ClinicDB) spanning liver, kidney, spine, and polyp segmentation. The nnU-Net framework was used under varying data regimes ("minimal real" to "full real + 100% synthetic") and two synthetic strategies: augmentation and pretraining.

Key findings:

  • Low-data regimes: Synthetic augmentation improved Dice scores (e.g., 0.582→0.678 liver BTCV, 0.377→0.685 polyp CVC).
  • Synthetic pretraining plus augmentation: Comparable or higher gains (e.g., Dice 0.697, polyp CVC, minimal real).
  • Moderate/high data: 50% synthetic augmentation yielded significant improvement for challenging tasks, with p<0.01p<0.01.
  • High baseline tasks: Spine segmentation saw negligible augmentation benefit, but pretraining improved minimal-data performance (e.g., IVD: 0.924→0.957).

7. Limitations and Future Directions

Limitations:

  • RadImageGAN generates only 2D slices, not volumetric data.
  • Pathology class definitions may conflate sequences/views, potentially diluting specificity.
  • Manual mask annotation was performed by single experts, introducing bias.
  • Cardiac MRI and ultrasound are not addressed.

Future research directions:

  • Extension to 3D volumetric GAN synthesis.
  • Expansion to additional modalities (ultrasound, cardiac MRI) and taxonomy refinement for class labels.
  • Unsupervised or pseudo-labeling of segmentation masks.
  • Exploration of synthetic/real data ratios and curriculum learning strategies for data efficiency and model robustness.

RadImageGAN constitutes a scalable framework for large-scale, multi-modal synthetic medical image generation and auto-labeled dataset creation, contributing substantially to data augmentation, transfer learning, and segmentation under both scarcity and abundance conditions (Liu et al., 2023).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to RadImageGAN.