RadImageGAN: Multi-Modal Synthetic Imaging

Updated 16 February 2026

RadImageGAN is a multi-modal generative adversarial network that synthesizes high-resolution CT, MRI, and endoscopy images using large-scale datasets and class-conditioning.
It employs progressive resizing, style modulation, and conditional training to achieve high image fidelity and robust anatomical/pathological diversity.
Integration with BigDatasetGAN enables pixel-wise segmentation, enhancing data augmentation and transfer learning for complex medical imaging tasks.

RadImageGAN is a multi-modal, dataset-scale generative adversarial network (GAN) designed to synthesize high-resolution, multi-class medical images spanning computed tomography (CT), magnetic resonance imaging (MRI), and endoscopy. Trained on the large-scale RadImageNet database and extended with BigDatasetGAN for pixel-wise segmentation annotations, RadImageGAN enables the generation of synthetic labeled datasets across 12 anatomical regions and 130 pathological classes. The system is engineered to address the scarcity and annotation cost issues inherent to medical imaging datasets by supporting diverse data augmentation and transfer learning for downstream segmentation tasks (Liu et al., 2023).

1. Dataset Foundation and Preprocessing

RadImageGAN utilizes subsets of RadImageNet and HyperKvasir for training two generator instances:

RadImageGAN-CT/MR: Trained on 880,314 axial CT and MRI 2D images from 102,774 unique patients, covering 124 radiologic pathological classes across multiple anatomical regions (abdomen, brain, spine, knee, etc.).
RadImageGAN-Gastro: Trained on 5,713 colonoscopy images from HyperKvasir, modeling six gastrointestinal pathological classes.

All three modalities—CT, MRI, and endoscopy—are addressed. Preprocessing includes progressive resizing across six resolutions (16×16 to 512×512 px), leveraging prior stage weights for stability. X-axis flips are applied at all but the final stage. Class-conditional labels from RadImageNet guide the generation process, enabling control over anatomical and pathological content.

2. Model Architecture and Loss Functions

RadImageGAN builds on the StyleGAN-XL architecture, extending the latent space, diversity, and multi-class handling of StyleGAN3. The key components include:

Generator ( $G$ ): Latent vectors $z \sim \mathcal{N}(0, I)$ are mapped to style vectors $w$ , which modulate convolutional blocks through affine transformations. Progressive growing introduces new "head" layers at each upsampled resolution, culminating in 33 layers at 512×512 resolution.
Discriminator ( $D$ ): A mirror architecture that downsamples images, with class-conditional guidance for real/fake prediction.

Major architectural features:

Style modulation: Affine-transformed $w$ modulates convolutional weights, enabling semantic control.
Class-conditioning: Categorical embeddings of pathology/anatomy condition both $G$ and $D$ .
Progressive growing: Incremental blending of new layers stabilizes training at increasing resolutions.

Loss functions:

Adversarial loss:

$L_{\text{adv}} = \mathbb{E}_{x \sim p_{\text{data}}}[-\log D(x)] + \mathbb{E}_{z \sim p_z}[-\log (1-D(G(z)))]$

R1 regularization:

$L_{R1} = \frac{\gamma}{2}\mathbb{E}_{x \sim p_{\text{data}}}[\|\nabla_x D(x)\|^2]$

Path-length regularization:

$L_{pl} = \mathbb{E}_{z, \epsilon \sim \mathcal{N}(0,I)}\left[(\|\nabla_w G(w) \cdot \epsilon\| - a)^2\right]$

with $w$ from the mapping network, $\epsilon$ random noise, and $a$ an EMA of path lengths.

Final objectives:

$L_G = \mathbb{E}_z[-\log D(G(z))] + \lambda_{pl}L_{pl}$
$L_D = \mathbb{E}_x[-\log D(x)] + \mathbb{E}_z[-\log(1-D(G(z)))] + \lambda_{R1}L_{R1}$

3. Training Protocols and Computational Resources

Training utilized high-end GPU clusters:

RadImageGAN-CT/MR: Eight NVIDIA DGX-A100 (640 GB VRAM total), 4,563 GPU-hours.
RadImageGAN-Gastro: Eight DGX-V100 (256 GB VRAM total), 3,088 GPU-hours.

Batch sizes: 2,048 (16×16, 32×32), 256 (64×64+). Learning rates followed StyleGAN-XL defaults ( $2\times 10^{-3}$ for both $G$ and $D$ , with linear decay at final stages). At each upscale, prior model weights were loaded and new head layers appended (seven per stage, five in the final stage).

4. Synthetic Image Fidelity and Diversity

RadImageGAN produces 512×512 px images across all supported classes and modalities. Output quality is characterized by Fréchet Inception Distance (FID), measured at each resolution.

Model	16×16	32×32	64×64	128×128	256×256	512×512
RadImageGAN-CT/MR	6.40	7.75	7.60	8.85	9.23	9.37
RadImageGAN-Gastro	8.08	9.12	11.36	8.74	5.50	5.05

Qualitative assessments indicate high realism in anatomical and pathological features, including contrast and lesion morphology for CT abdomen, MRI spine, and colonoscopy polyp classes.

5. Segmentation Annotation via BigDatasetGAN

Pixel-wise labeled data is achieved by integrating BigDatasetGAN. This approach attaches a lightweight "feature interpreter" segmentation head to the generator’s intermediate feature maps:

Synthetic image set generation: $N=50$ images per downstream task.
Expert annotation: Manual masks labeling each class.
Interpreter training: Generator weights frozen; interpreter $S$ trained on feature maps $\phi(x)$ to output class logits $p_c(i)$ via softmax.
Segmentation loss: Multi-class cross-entropy,

$L_{\text{seg}} = -\sum_{i=1}^{N}\sum_{c=1}^{C} y_{i,c} \log p_{i,c}$

with $y_{i,c}$ ground truth pixel labels.

Training uses a learning rate of $1\times 10^{-4}$ , batch size of 4, and 100 epochs. This yields RadImageGAN-Labeled, capable of generating synthetic images with corresponding pixel-wise masks across multiple classes with minimal manual annotation.

6. Impact in Downstream Segmentation

RadImageGAN-Labeled was evaluated on four public segmentation datasets (BTCV-Abdomen, CHAOS-MRI, Labeled Lumbar Spine MRI, CVC-ClinicDB) spanning liver, kidney, spine, and polyp segmentation. The nnU-Net framework was used under varying data regimes ("minimal real" to "full real + 100% synthetic") and two synthetic strategies: augmentation and pretraining.

Key findings:

Low-data regimes: Synthetic augmentation improved Dice scores (e.g., 0.582→0.678 liver BTCV, 0.377→0.685 polyp CVC).
Synthetic pretraining plus augmentation: Comparable or higher gains (e.g., Dice 0.697, polyp CVC, minimal real).
Moderate/high data: 50% synthetic augmentation yielded significant improvement for challenging tasks, with $p<0.01$ .
High baseline tasks: Spine segmentation saw negligible augmentation benefit, but pretraining improved minimal-data performance (e.g., IVD: 0.924→0.957).

7. Limitations and Future Directions

Limitations:

RadImageGAN generates only 2D slices, not volumetric data.
Pathology class definitions may conflate sequences/views, potentially diluting specificity.
Manual mask annotation was performed by single experts, introducing bias.
Cardiac MRI and ultrasound are not addressed.

Future research directions:

Extension to 3D volumetric GAN synthesis.
Expansion to additional modalities (ultrasound, cardiac MRI) and taxonomy refinement for class labels.
Unsupervised or pseudo-labeling of segmentation masks.
Exploration of synthetic/real data ratios and curriculum learning strategies for data efficiency and model robustness.

RadImageGAN constitutes a scalable framework for large-scale, multi-modal synthetic medical image generation and auto-labeled dataset creation, contributing substantially to data augmentation, transfer learning, and segmentation under both scarcity and abundance conditions (Liu et al., 2023).

Markdown Report Issue Upgrade to Chat

References (1)

RadImageGAN -- A Multi-modal Dataset-Scale Generative AI for Medical Imaging (2023)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to RadImageGAN.