GANeXt: 3D Patch-Based GAN for CT Synthesis

Updated 29 December 2025

GANeXt is a 3D patch-based GAN that synthesizes CT volumes from MRI and CBCT with a U-shaped generator based on 3D ConvNeXt blocks, delivering high-fidelity anatomical reconstructions.
It integrates multiple loss functions including voxelwise, perceptual, segmentation-aware, and adversarial losses to ensure structural and intensity accuracy.
The framework supports unified, multi-region sCT generation for adaptive radiotherapy, employing robust training, augmentation, and sliding-window inference protocols.

GANeXt is a 3D, patch-based generative adversarial network (GAN) framework designed for the synthesis of computed tomography (CT) volumes from both magnetic resonance imaging (MRI) and cone-beam CT (CBCT), supporting unified, multi-region sCT generation crucial for adaptive radiotherapy. GANeXt features a fully convolutional U-shaped generator built entirely from 3D ConvNeXt blocks and utilizes a conditional PatchGAN discriminator, with multi-head segmentation for CBCT-to-CT tasks. The architecture harnesses a suite of loss functions—including voxelwise, perceptual, segmentation-aware, and adversarial losses—and employs robust training, augmentation, and inference protocols to maximize synthesis fidelity and anatomical quality (Mei et al., 22 Dec 2025).

1. Architectural Overview

GANeXt’s architecture revolves around two principal modules: a 3D U-shaped generator based on ConvNeXt blocks (denoted “GeNeXt” in this context), and a conditional PatchGAN discriminator.

ConvNeXt-Enhanced Generator

GeNeXt Block: Each block receives an input feature map $X\in\mathbb{R}^{C\times D\times H\times W}$ , processes it by applying a $3\times3\times3$ depthwise convolution (DWConv), instance normalization, GELU activation, channel expansion via pointwise convFFN, and a residual connection. Expansion ratios $R_i$ are stage-specific.
Downsampling/Up-sampling: Downsampling blocks apply stride-2 DWConvs; shortcut paths use $1\times1\times1$ “resize-convs.” Upsampling blocks use stride-2 transposed DWConvs.
U-Shape Composition: Five stages are organized as encoder (four downsampling), bottleneck, and decoder (four upsampling) with skip connections at each decoder stage, realized by channel-wise concatenation and $1\times1\times1$ projection. Channel progression is $\{32, 64, 128, 256, 512\}$ ; blocks per stage: $\{3,4,6,6,6\}$ .
Patch-based Processing: During training, 3D patches (MRI-to-CT: $1\times32\times160\times192$ ; CBCT-to-CT: $1\times32\times128\times128$ ) are extracted for generator input.

Conditional PatchGAN Discriminator

Design: Receives conditioning/generation pairs along the channel dimension, processed by cascaded $4\times4\times4$ strided convolutions (LeakyReLU, instance norm), downsampling to a receptive field of $16^3$ voxels.
Outputs: Main head gives a $1\times D/16\times H/16\times W/16$ map of real/fake logits; for CBCT-to-CT synthesis, an additional multi-class segmentation head applies a shallow decoder to re-upsample and predict anatomical structures.

2. Loss Functions

GANeXt aggregates multiple complementary objectives during training:

Loss Term	Symbol	MRI→CT Weight	CBCT→CT Weight
Mean Absolute Error	$\mathcal{L}_{\mathrm{mae}}$	$10$	$10$
Perceptual Loss (ConvNeXt-B LPIPS)	$\mathcal{L}_{\mathrm{perc}}$	$1$	$1$
Segmentation-Masked MAE	$\mathcal{L}_{\mathrm{mask}}$	$50$	$10$
Adversarial Loss	$\mathcal{L}_{\mathrm{adv}}$	$10$	$10$
Feature-Matching Loss	$\mathcal{L}_{\mathrm{fm}}$	$10$	$10$
Segmentation Loss (only CBCT→CT)	$\mathcal{L}_{\mathrm{seg}_g}$	$0$	$0.5$

Mean Absolute Error: Promotes voxelwise intensity fidelity.
Perceptual Loss: Utilizes ConvNeXt-B LPIPS to preserve structural realism at feature levels.
Segmentation-Masked MAE: Restricts error assessment to anatomically relevant regions using a pretrained TotalSegmentator mask.
Adversarial Loss: Implements standard cGAN formulation.
Feature-Matching: Encourages generator alignment with discriminator features.
Segmentation Loss: Applies Dice and cross-entropy (multi-class) to enforce anatomical accuracy during CBCT-to-CT synthesis; separate components for discriminator ( $\mathcal{L}_{\mathrm{seg}_d}$ ) and generator ( $\mathcal{L}_{\mathrm{seg}_g}$ ).

Generator Objectives

MRI-to-CT: $\mathcal{L}_{\mathrm{G}} = 10\,\mathcal{L}_{\mathrm{mae}} + 1\,\mathcal{L}_{\mathrm{perc}} + 50\,\mathcal{L}_{\mathrm{mask}} + 10\,\mathcal{L}_{\mathrm{adv}} + 10\,\mathcal{L}_{\mathrm{fm}}$
CBCT-to-CT: Identical, with segmentation loss $\mathcal{L}_{\mathrm{seg}_g}$ added at $0.5$ weight.

Discriminator Objective (CBCT-to-CT)

$\mathcal{L}_{\mathrm{D}} = - \mathbb{E}\bigl[\log D(x,y)\bigr] - \mathbb{E}\bigl[\log(1-D(x,G(x)))\bigr] + 0.5\,\mathcal{L}_{\mathrm{seg}_d}$

3. Training Protocol

The training leverages modern optimization and regularization components:

Optimizers: Generator: AdamW ( $\text{lr}=5\times10^{-4}$ , weight decay $0.01$); Discriminator: AdamW for MRI-to-CT, Adam for CBCT-to-CT ( $\text{lr}=1\times10^{-3}$ ).
Schedulers: Linear warmup for first 5% epochs, followed by cosine annealing to zero (MRI-to-CT: 3000 epochs; CBCT-to-CT: 1000 epochs).
Batch Size: 8 per iteration, distributed over 4 NVIDIA A100 GPUs.
Model Selection: Final checkpoint is used at epoch cutoff without further fine-tuning.
Algorithmic Workflow: Each batch undergoes augmentation, generator and discriminator both updated separately, and learning-rate schedulers progressed per batch. Explicit pseudocode structures this process.

4. Data Preprocessing and Augmentation

GANeXt applies rigorous preprocessing steps:

Deformable Registration: All CTs are deformably registered to the challenge test-set reference for spatial correspondence with MRI/CBCT.
Foreground Cropping: Anatomical region isolation via simple body mask generation (thresholding plus morphology).
Intensity Normalization:
- Inputs (MRI/CBCT): Percentile clipping ([0.5th, 99.5th]) and scaling to $[0,1]$ .
- CT targets: Linear mapping of Hounsfield Units to $[0,1]$ within modality-specific ranges: $[-1024,1000]$ for MRI-to-CT, $[-1024,1500]$ for CBCT-to-CT.
Augmentation:
- MRI-to-CT Only: Random zooming (scale in $[0.8,1.3]$ ).
- Both tasks: Randomly cropped to fixed patch sizes, random horizontal/vertical flipping.

5. Inference and Output Reconstruction

Inference employs an efficient sliding-window strategy:

Sliding-Window Inference: Overlap of $0.8$ in each dimension; each patch is processed and the predictions are reconstructed by voxelwise averaging (“average folding”) to maintain consistency across overlapping regions.
Postprocessing: CT normalization is inverted to restore Hounsfield units:

$\mathrm{HU} = \mathrm{clip}(\hat y, 0, 1)\cdot(\mathrm{maxHU}-\mathrm{minHU}) + \mathrm{minHU}.$

No Fine-Tuning: After joint training across all anatomical regions, the selected models undergo no additional region- or modality-specific adaptation.

6. Context and Performance

GANeXt’s stacking of 3D ConvNeXt modules within a U-Net topological framework, combined with multi-term regularization and adversarial objectives, yields state-of-the-art synthetic CT outputs for both MRI-to-CT and CBCT-to-CT targets. The approach remains purely convolutional and computationally efficient, while accommodating segmentation-based anatomical guidance for further accuracy in challenging CBCT-to-CT translation (Mei et al., 22 Dec 2025).

A plausible implication is that GANeXt’s architectural and loss-function innovations facilitate generalizable, high-fidelity medical image synthesis across multiple modalities and anatomical domains without recourse to modality- or region-specific retraining or post hoc adaptation.

PDF Markdown Chat (Pro)

References (1)

GANeXt: A Fully ConvNeXt-Enhanced Generative Adversarial Network for MRI- and CBCT-to-CT Synthesis (2025)

Whiteboard

Generate a whiteboard explanation of this topic.

Topic to Video (Beta)

Generate a video overview of this topic.

Follow Topic

Get notified by email when new papers are published related to GANeXt.