GANeXt: 3D Patch-Based GAN for CT Synthesis
- GANeXt is a 3D patch-based GAN that synthesizes CT volumes from MRI and CBCT with a U-shaped generator based on 3D ConvNeXt blocks, delivering high-fidelity anatomical reconstructions.
- It integrates multiple loss functions including voxelwise, perceptual, segmentation-aware, and adversarial losses to ensure structural and intensity accuracy.
- The framework supports unified, multi-region sCT generation for adaptive radiotherapy, employing robust training, augmentation, and sliding-window inference protocols.
GANeXt is a 3D, patch-based generative adversarial network (GAN) framework designed for the synthesis of computed tomography (CT) volumes from both magnetic resonance imaging (MRI) and cone-beam CT (CBCT), supporting unified, multi-region sCT generation crucial for adaptive radiotherapy. GANeXt features a fully convolutional U-shaped generator built entirely from 3D ConvNeXt blocks and utilizes a conditional PatchGAN discriminator, with multi-head segmentation for CBCT-to-CT tasks. The architecture harnesses a suite of loss functions—including voxelwise, perceptual, segmentation-aware, and adversarial losses—and employs robust training, augmentation, and inference protocols to maximize synthesis fidelity and anatomical quality (Mei et al., 22 Dec 2025).
1. Architectural Overview
GANeXt’s architecture revolves around two principal modules: a 3D U-shaped generator based on ConvNeXt blocks (denoted “GeNeXt” in this context), and a conditional PatchGAN discriminator.
ConvNeXt-Enhanced Generator
- GeNeXt Block: Each block receives an input feature map , processes it by applying a depthwise convolution (DWConv), instance normalization, GELU activation, channel expansion via pointwise convFFN, and a residual connection. Expansion ratios are stage-specific.
- Downsampling/Up-sampling: Downsampling blocks apply stride-2 DWConvs; shortcut paths use “resize-convs.” Upsampling blocks use stride-2 transposed DWConvs.
- U-Shape Composition: Five stages are organized as encoder (four downsampling), bottleneck, and decoder (four upsampling) with skip connections at each decoder stage, realized by channel-wise concatenation and projection. Channel progression is ; blocks per stage: .
- Patch-based Processing: During training, 3D patches (MRI-to-CT: ; CBCT-to-CT: ) are extracted for generator input.
Conditional PatchGAN Discriminator
- Design: Receives conditioning/generation pairs along the channel dimension, processed by cascaded strided convolutions (LeakyReLU, instance norm), downsampling to a receptive field of voxels.
- Outputs: Main head gives a map of real/fake logits; for CBCT-to-CT synthesis, an additional multi-class segmentation head applies a shallow decoder to re-upsample and predict anatomical structures.
2. Loss Functions
GANeXt aggregates multiple complementary objectives during training:
| Loss Term | Symbol | MRI→CT Weight | CBCT→CT Weight |
|---|---|---|---|
| Mean Absolute Error | $10$ | $10$ | |
| Perceptual Loss (ConvNeXt-B LPIPS) | $1$ | $1$ | |
| Segmentation-Masked MAE | $50$ | $10$ | |
| Adversarial Loss | $10$ | $10$ | |
| Feature-Matching Loss | $10$ | $10$ | |
| Segmentation Loss (only CBCT→CT) | $0$ | $0.5$ |
- Mean Absolute Error: Promotes voxelwise intensity fidelity.
- Perceptual Loss: Utilizes ConvNeXt-B LPIPS to preserve structural realism at feature levels.
- Segmentation-Masked MAE: Restricts error assessment to anatomically relevant regions using a pretrained TotalSegmentator mask.
- Adversarial Loss: Implements standard cGAN formulation.
- Feature-Matching: Encourages generator alignment with discriminator features.
- Segmentation Loss: Applies Dice and cross-entropy (multi-class) to enforce anatomical accuracy during CBCT-to-CT synthesis; separate components for discriminator () and generator ().
Generator Objectives
- MRI-to-CT:
- CBCT-to-CT: Identical, with segmentation loss added at $0.5$ weight.
Discriminator Objective (CBCT-to-CT)
3. Training Protocol
The training leverages modern optimization and regularization components:
- Optimizers: Generator: AdamW (, weight decay $0.01$); Discriminator: AdamW for MRI-to-CT, Adam for CBCT-to-CT ().
- Schedulers: Linear warmup for first 5% epochs, followed by cosine annealing to zero (MRI-to-CT: 3000 epochs; CBCT-to-CT: 1000 epochs).
- Batch Size: 8 per iteration, distributed over 4 NVIDIA A100 GPUs.
- Model Selection: Final checkpoint is used at epoch cutoff without further fine-tuning.
- Algorithmic Workflow: Each batch undergoes augmentation, generator and discriminator both updated separately, and learning-rate schedulers progressed per batch. Explicit pseudocode structures this process.
4. Data Preprocessing and Augmentation
GANeXt applies rigorous preprocessing steps:
- Deformable Registration: All CTs are deformably registered to the challenge test-set reference for spatial correspondence with MRI/CBCT.
- Foreground Cropping: Anatomical region isolation via simple body mask generation (thresholding plus morphology).
- Intensity Normalization:
- Inputs (MRI/CBCT): Percentile clipping ([0.5th, 99.5th]) and scaling to .
- CT targets: Linear mapping of Hounsfield Units to within modality-specific ranges: for MRI-to-CT, for CBCT-to-CT.
- Augmentation:
- MRI-to-CT Only: Random zooming (scale in ).
- Both tasks: Randomly cropped to fixed patch sizes, random horizontal/vertical flipping.
5. Inference and Output Reconstruction
Inference employs an efficient sliding-window strategy:
- Sliding-Window Inference: Overlap of $0.8$ in each dimension; each patch is processed and the predictions are reconstructed by voxelwise averaging (“average folding”) to maintain consistency across overlapping regions.
- Postprocessing: CT normalization is inverted to restore Hounsfield units:
- No Fine-Tuning: After joint training across all anatomical regions, the selected models undergo no additional region- or modality-specific adaptation.
6. Context and Performance
GANeXt’s stacking of 3D ConvNeXt modules within a U-Net topological framework, combined with multi-term regularization and adversarial objectives, yields state-of-the-art synthetic CT outputs for both MRI-to-CT and CBCT-to-CT targets. The approach remains purely convolutional and computationally efficient, while accommodating segmentation-based anatomical guidance for further accuracy in challenging CBCT-to-CT translation (Mei et al., 22 Dec 2025).
A plausible implication is that GANeXt’s architectural and loss-function innovations facilitate generalizable, high-fidelity medical image synthesis across multiple modalities and anatomical domains without recourse to modality- or region-specific retraining or post hoc adaptation.