Papers
Topics
Authors
Recent
Search
2000 character limit reached

MedNeXt-v2: 3D CNN Backbone for Medical Segmentation

Updated 16 April 2026
  • MedNeXt-v2 is a 3D convolutional neural network backbone designed for volumetric medical image segmentation that extends ConvNeXt principles.
  • It incorporates advanced normalization, compound scaling, flexible kernel adaptation (UpKern), and deep supervision to achieve state-of-the-art performance across diverse modalities and anatomical regions.
  • Its robust pretraining on large-scale CT volumes combined with tailored fine-tuning strategies results in enhanced segmentation accuracy for organs, tumors, and lesions.

MedNeXt-v2 is a 3D convolutional neural network backbone that extends ConvNeXt-style design principles for application in volumetric medical image segmentation and downstream medical tasks. It incorporates advanced normalization, compound scaling, flexible kernel adaptation strategies, and strong supervised pretraining to achieve state-of-the-art (SOTA) performance across diverse modalities, anatomical regions, and clinical segmentation challenges. MedNeXt-v2 is widely used in both generalist large-scale representation learning workflows and task-specific architectures for organ, tumor, and lesion segmentation in computed tomography (CT) and magnetic resonance (MR) imaging (Roy et al., 19 Dec 2025, Musah, 3 Aug 2025, Jaheen et al., 31 Jul 2025).

1. MedNeXt-v2 Micro-Architecture and Core Components

MedNeXt-v2 utilizes a five-level 3D U-Net macro-architecture with ConvNeXt-style residual blocks as both encoder and decoder units. The essential micro-architectural features are:

  • 3D ConvNeXt Block: Each block comprises a 3×3×3 depthwise convolution, instance normalization, 1×1×1 pointwise channel expansion (e.g., to 7× width), GELU activation, a 3D Global Response Normalization (GRN) module, a 1×1×1 pointwise projection, and residual skip connection.
  • 3D Global Response Normalization (GRN): For feature tensor X∈RC×H×W×DX\in\mathbb{R}^{C\times H\times W\times D}, GRN is defined:

N(Xi)=∥Xi∥2∑j=1C∥Xj∥2,Yi=γi (Xi N(Xi))+βi+XiN(X_i) = \frac{\|X_i\|_2}{\sum_{j=1}^C\|X_j\|_2}, \quad Y_i = \gamma_i\,(X_i\,N(X_i)) + \beta_i + X_i

with learnable parameters γi\gamma_i, βi\beta_i per channel, stabilizing activation norms over large expansion ratios. This mechanism prevents activation collapse and over-dominance of individual channels (Roy et al., 19 Dec 2025).

  • Deep Supervision: Auxiliary 1×1×1 convolutional outputs at intermediate decoder levels with corresponding decoder-stage ground truth supervision.
  • Flexible Downsampling/Upsampling: Strided depthwise or transposed-depthwise convolutions in a residual formulation enable efficient multi-scale feature learning.

This core design is shared across MedNeXt-v2 applications in both general supervised pretraining (Roy et al., 19 Dec 2025) and domain-adapted task-specific variants (Musah, 3 Aug 2025, Jaheen et al., 31 Jul 2025).

2. Compound Scaling and Model Variants

MedNeXt-v2 employs compound scaling, inspired by EfficientNet, with simultaneous scaling of depth, width (channels), and input context (patch size):

  • For depth dd, width ww, and input size ss, parameterized by Ï•\phi:

d=d0⋅αϕ,w=w0⋅βϕ,s=s0⋅γϕd = d_0 \cdot \alpha^{\phi},\quad w = w_0 \cdot \beta^{\phi},\quad s = s_0 \cdot \gamma^{\phi}

where d0d_0, N(Xi)=∥Xi∥2∑j=1C∥Xj∥2,Yi=γi (Xi N(Xi))+βi+XiN(X_i) = \frac{\|X_i\|_2}{\sum_{j=1}^C\|X_j\|_2}, \quad Y_i = \gamma_i\,(X_i\,N(X_i)) + \beta_i + X_i0, N(Xi)=∥Xi∥2∑j=1C∥Xj∥2,Yi=γi (Xi N(Xi))+βi+XiN(X_i) = \frac{\|X_i\|_2}{\sum_{j=1}^C\|X_j\|_2}, \quad Y_i = \gamma_i\,(X_i\,N(X_i)) + \beta_i + X_i1 are baseline, and N(Xi)=∥Xi∥2∑j=1C∥Xj∥2,Yi=γi (Xi N(Xi))+βi+XiN(X_i) = \frac{\|X_i\|_2}{\sum_{j=1}^C\|X_j\|_2}, \quad Y_i = \gamma_i\,(X_i\,N(X_i)) + \beta_i + X_i2 control scaling per dimension.

  • Canonical Variants:
    • Base: 52 layers, default channels, N(Xi)=∥Xi∥2∑j=1C∥Xj∥2,Yi=γi (Xi N(Xi))+βi+XiN(X_i) = \frac{\|X_i\|_2}{\sum_{j=1}^C\|X_j\|_2}, \quad Y_i = \gamma_i\,(X_i\,N(X_i)) + \beta_i + X_i3 patch size.
    • Patch×1.5: 52 layers, default channels, N(Xi)=∥Xi∥2∑j=1C∥Xj∥2,Yi=γi (Xi N(Xi))+βi+XiN(X_i) = \frac{\|X_i\|_2}{\sum_{j=1}^C\|X_j\|_2}, \quad Y_i = \gamma_i\,(X_i\,N(X_i)) + \beta_i + X_i4 patch size (increased context).
    • Width×2.0: 52 layers, double channel width, N(Xi)=∥Xi∥2∑j=1C∥Xj∥2,Yi=γi (Xi N(Xi))+βi+XiN(X_i) = \frac{\|X_i\|_2}{\sum_{j=1}^C\|X_j\|_2}, \quad Y_i = \gamma_i\,(X_i\,N(X_i)) + \beta_i + X_i5 patch size.

Performance saturates before pure width scaling (e.g., N(Xi)=∥Xi∥2∑j=1C∥Xj∥2,Yi=γi (Xi N(Xi))+βi+XiN(X_i) = \frac{\|X_i\|_2}{\sum_{j=1}^C\|X_j\|_2}, \quad Y_i = \gamma_i\,(X_i\,N(X_i)) + \beta_i + X_i6M parameters in Width×2.0 do not surpass N(Xi)=∥Xi∥2∑j=1C∥Xj∥2,Yi=γi (Xi N(Xi))+βi+XiN(X_i) = \frac{\|X_i\|_2}{\sum_{j=1}^C\|X_j\|_2}, \quad Y_i = \gamma_i\,(X_i\,N(X_i)) + \beta_i + X_i7M in Patch×1.5 for mean segmentation accuracy), indicating that augmenting context and architecture yields greater returns than naïve parameter scaling (Roy et al., 19 Dec 2025).

3. Training Protocols and Pretraining Regimes

Large-scale supervised pretraining is central to MedNeXt-v2’s utility:

  • Pretraining: Conducted on 18,000 CT volumes from the CADS collection (44 anatomical labels), using z-score normalization, isotropic resampling, and extensive augmentations.
  • Optimization: AdamW, batch size 8 (2 per GPU), 1500 epochs, distributed training with nnU-Net augmentation/loss protocols.
  • Fine-tuning: Warmup for 50 epochs, then 250 further epochs on specific tasks/datasets, patch size variant-dependent.
  • Task Coverage: Demonstrated on six CT/MR benchmarks representing 144 structures: pediatric organs, knee MR, CBCT dental classes, brain metastases, pancreatic tumor, and vertebrae (Roy et al., 19 Dec 2025).

Task-specific adaptations (e.g., EMedNeXt) use similar pretraining, followed by fine-tuning and parameter unfreezing strategies to accommodate low-resource settings or domain shifts (e.g., SSA low-field MRIs), sometimes freezing encoder and only unfreezing late decoder blocks (Jaheen et al., 31 Jul 2025).

4. Specialized Architectural and Training Innovations

Several key innovations distinguish MedNeXt-v2 and its derivatives:

  • UpKern Algorithm: Allows kernel size expansion (e.g., N(Xi)=∥Xi∥2∑j=1C∥Xj∥2,Yi=γi (Xi N(Xi))+βi+XiN(X_i) = \frac{\|X_i\|_2}{\sum_{j=1}^C\|X_j\|_2}, \quad Y_i = \gamma_i\,(X_i\,N(X_i)) + \beta_i + X_i8) without random reinitialization, using trilinear interpolation:

N(Xi)=∥Xi∥2∑j=1C∥Xj∥2,Yi=γi (Xi N(Xi))+βi+XiN(X_i) = \frac{\|X_i\|_2}{\sum_{j=1}^C\|X_j\|_2}, \quad Y_i = \gamma_i\,(X_i\,N(X_i)) + \beta_i + X_i9

where γi\gamma_i0.

  • Receptive Field Analysis: Formalized as

γi\gamma_i1

Kernel upgrade increases effective receptive field by up to 70% (from 67 voxels for 3×3×3 to 115 voxels for 5×5×5 in typical layouts) (Musah, 3 Aug 2025).

  • Deep Supervision with Boundary-Aware Loss (EMedNeXt):
    • Multi-level outputs supervised by Dice-Focal plus boundary loss at each output level.
    • Combined loss: γi\gamma_i2, γi\gamma_i3.
    • Boundary loss leverages channel-wise 3D Sobel operator; opportunity for more precise boundary localization (Jaheen et al., 31 Jul 2025).
  • Region of Interest (ROI) Enlargement: Enlarged input patches (up to γi\gamma_i4) facilitate global context acquisition in presence of large or diffuse pathologies (Jaheen et al., 31 Jul 2025).

5. Quantitative Results and Benchmark Comparisons

MedNeXt-v2 consistently establishes new SOTA in multi-organ and pathological region segmentation:

Pretrained Backbone Mean DSC Mean NSD Reference
nnU-Net (scratch) 80.57 78.49 (Roy et al., 19 Dec 2025)
MedNeXt-v2 (scratch) 82.31 80.34 (Roy et al., 19 Dec 2025)
MedNeXt-v2 Base 82.95 81.06 (Roy et al., 19 Dec 2025)
MedNeXt-v2 Patch×1.5 83.70 81.77 (Roy et al., 19 Dec 2025)
EMedNeXt (SSA Glioma) 0.897* 0.541* (Jaheen et al., 31 Jul 2025)

*Average Lesion-Wise DSC, NSD (0.5 mm tolerance), hidden SSA validation set.

For breast tumor segmentation in DCE-MRI:

  • Dice scores: 0.67 (large-kernel ensemble, post-UpKern), up from 0.64 (baseline 3×3×3), NormHD improved to 0.24 (Musah, 3 Aug 2025).

Ablation establishes that backbone validation accuracy predicts pretraining transfer, and context scaling generally outperforms width scaling in mean accuracy (Roy et al., 19 Dec 2025).

6. Downstream Task Integration and Pipeline Extensions

MedNeXt-v2 supports a variety of downstream tasks and specialized pipelines:

  • Radiomics-Driven pCR Classification: In breast DCE-MRI, 40 radiomic features from segmentations are passed to a self-normalizing all-FC network (SNN: [128, 64, 32] → 1, SELU activations). Yields 57% average balanced accuracy, peaking at 75% in certain subgroups (Musah, 3 Aug 2025).
  • EMedNeXt Post-Processing: For low-resource MR, employs:
    • Sliding-window inference with test-time augmentation.
    • Class-specific probability thresholding and connected component pruning.
    • Hierarchical enforcement (e.g., enforcing γi\gamma_i5).
    • Priority-label fusion strategy to ensure robust, interpretable outputs (Jaheen et al., 31 Jul 2025).

7. Relationship to Previous MedNeXt and Research Implications

MedNeXt-v2 extends MedNeXt-v1, which was limited to fixed 3×3×3 kernels and lacked GRN, compound scaling, or explicit context scaling strategies:

  • Kernel expansion and UpKern yield γi\gamma_i6+70% receptive field growth and up to +0.03 Dice gain in challenging tasks (Musah, 3 Aug 2025).
  • Advanced micro-architecture, especially GRN, and deep supervision have demonstrable impact on transfer learning and boundary precision.
  • Representation scaling benefits are disproportionately large for pathological (vs. anatomical) segmentation tasks.

Pretraining generalizes across modality boundaries; fine-tuning with complete dataset access eliminates most benefits of modality-specific pretraining (Roy et al., 19 Dec 2025). A plausible implication is that domain-agnostic representations suffice given sufficient fine-tuning, diminishing the need for highly specialized pretraining pipelines.

References

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to MedNeXt-v2.