MedNeXt-v2: 3D CNN Backbone for Medical Segmentation

Updated 16 April 2026

MedNeXt-v2 is a 3D convolutional neural network backbone designed for volumetric medical image segmentation that extends ConvNeXt principles.
It incorporates advanced normalization, compound scaling, flexible kernel adaptation (UpKern), and deep supervision to achieve state-of-the-art performance across diverse modalities and anatomical regions.
Its robust pretraining on large-scale CT volumes combined with tailored fine-tuning strategies results in enhanced segmentation accuracy for organs, tumors, and lesions.

MedNeXt-v2 is a 3D convolutional neural network backbone that extends ConvNeXt-style design principles for application in volumetric medical image segmentation and downstream medical tasks. It incorporates advanced normalization, compound scaling, flexible kernel adaptation strategies, and strong supervised pretraining to achieve state-of-the-art (SOTA) performance across diverse modalities, anatomical regions, and clinical segmentation challenges. MedNeXt-v2 is widely used in both generalist large-scale representation learning workflows and task-specific architectures for organ, tumor, and lesion segmentation in computed tomography (CT) and magnetic resonance (MR) imaging (Roy et al., 19 Dec 2025, Musah, 3 Aug 2025, Jaheen et al., 31 Jul 2025).

1. MedNeXt-v2 Micro-Architecture and Core Components

MedNeXt-v2 utilizes a five-level 3D U-Net macro-architecture with ConvNeXt-style residual blocks as both encoder and decoder units. The essential micro-architectural features are:

3D ConvNeXt Block: Each block comprises a 3×3×3 depthwise convolution, instance normalization, 1×1×1 pointwise channel expansion (e.g., to 7× width), GELU activation, a 3D Global Response Normalization (GRN) module, a 1×1×1 pointwise projection, and residual skip connection.
3D Global Response Normalization (GRN): For feature tensor $X\in\mathbb{R}^{C\times H\times W\times D}$ , GRN is defined:

$N(X_i) = \frac{\|X_i\|_2}{\sum_{j=1}^C\|X_j\|_2}, \quad Y_i = \gamma_i\,(X_i\,N(X_i)) + \beta_i + X_i$

with learnable parameters $\gamma_i$ , $\beta_i$ per channel, stabilizing activation norms over large expansion ratios. This mechanism prevents activation collapse and over-dominance of individual channels (Roy et al., 19 Dec 2025).

Deep Supervision: Auxiliary 1×1×1 convolutional outputs at intermediate decoder levels with corresponding decoder-stage ground truth supervision.
Flexible Downsampling/Upsampling: Strided depthwise or transposed-depthwise convolutions in a residual formulation enable efficient multi-scale feature learning.

This core design is shared across MedNeXt-v2 applications in both general supervised pretraining (Roy et al., 19 Dec 2025) and domain-adapted task-specific variants (Musah, 3 Aug 2025, Jaheen et al., 31 Jul 2025).

2. Compound Scaling and Model Variants

MedNeXt-v2 employs compound scaling, inspired by EfficientNet, with simultaneous scaling of depth, width (channels), and input context (patch size):

For depth $d$ , width $w$ , and input size $s$ , parameterized by $\phi$ :

$d = d_0 \cdot \alpha^{\phi},\quad w = w_0 \cdot \beta^{\phi},\quad s = s_0 \cdot \gamma^{\phi}$

where $d_0$ , $N(X_i) = \frac{\|X_i\|_2}{\sum_{j=1}^C\|X_j\|_2}, \quad Y_i = \gamma_i\,(X_i\,N(X_i)) + \beta_i + X_i$ 0, $N(X_i) = \frac{\|X_i\|_2}{\sum_{j=1}^C\|X_j\|_2}, \quad Y_i = \gamma_i\,(X_i\,N(X_i)) + \beta_i + X_i$ 1 are baseline, and $N(X_i) = \frac{\|X_i\|_2}{\sum_{j=1}^C\|X_j\|_2}, \quad Y_i = \gamma_i\,(X_i\,N(X_i)) + \beta_i + X_i$ 2 control scaling per dimension.

Canonical Variants:
- Base: 52 layers, default channels, $N(X_i) = \frac{\|X_i\|_2}{\sum_{j=1}^C\|X_j\|_2}, \quad Y_i = \gamma_i\,(X_i\,N(X_i)) + \beta_i + X_i$ 3 patch size.
- Patch×1.5: 52 layers, default channels, $N(X_i) = \frac{\|X_i\|_2}{\sum_{j=1}^C\|X_j\|_2}, \quad Y_i = \gamma_i\,(X_i\,N(X_i)) + \beta_i + X_i$ 4 patch size (increased context).
- Width×2.0: 52 layers, double channel width, $N(X_i) = \frac{\|X_i\|_2}{\sum_{j=1}^C\|X_j\|_2}, \quad Y_i = \gamma_i\,(X_i\,N(X_i)) + \beta_i + X_i$ 5 patch size.

Performance saturates before pure width scaling (e.g., $N(X_i) = \frac{\|X_i\|_2}{\sum_{j=1}^C\|X_j\|_2}, \quad Y_i = \gamma_i\,(X_i\,N(X_i)) + \beta_i + X_i$ 6M parameters in Width×2.0 do not surpass $N(X_i) = \frac{\|X_i\|_2}{\sum_{j=1}^C\|X_j\|_2}, \quad Y_i = \gamma_i\,(X_i\,N(X_i)) + \beta_i + X_i$ 7M in Patch×1.5 for mean segmentation accuracy), indicating that augmenting context and architecture yields greater returns than naïve parameter scaling (Roy et al., 19 Dec 2025).

3. Training Protocols and Pretraining Regimes

Large-scale supervised pretraining is central to MedNeXt-v2’s utility:

Pretraining: Conducted on 18,000 CT volumes from the CADS collection (44 anatomical labels), using z-score normalization, isotropic resampling, and extensive augmentations.
Optimization: AdamW, batch size 8 (2 per GPU), 1500 epochs, distributed training with nnU-Net augmentation/loss protocols.
Fine-tuning: Warmup for 50 epochs, then 250 further epochs on specific tasks/datasets, patch size variant-dependent.
Task Coverage: Demonstrated on six CT/MR benchmarks representing 144 structures: pediatric organs, knee MR, CBCT dental classes, brain metastases, pancreatic tumor, and vertebrae (Roy et al., 19 Dec 2025).

Task-specific adaptations (e.g., EMedNeXt) use similar pretraining, followed by fine-tuning and parameter unfreezing strategies to accommodate low-resource settings or domain shifts (e.g., SSA low-field MRIs), sometimes freezing encoder and only unfreezing late decoder blocks (Jaheen et al., 31 Jul 2025).

4. Specialized Architectural and Training Innovations

Several key innovations distinguish MedNeXt-v2 and its derivatives:

UpKern Algorithm: Allows kernel size expansion (e.g., $N(X_i) = \frac{\|X_i\|_2}{\sum_{j=1}^C\|X_j\|_2}, \quad Y_i = \gamma_i\,(X_i\,N(X_i)) + \beta_i + X_i$ 8) without random reinitialization, using trilinear interpolation:

$N(X_i) = \frac{\|X_i\|_2}{\sum_{j=1}^C\|X_j\|_2}, \quad Y_i = \gamma_i\,(X_i\,N(X_i)) + \beta_i + X_i$ 9

where $\gamma_i$ 0.

Receptive Field Analysis: Formalized as

$\gamma_i$ 1

Kernel upgrade increases effective receptive field by up to 70% (from 67 voxels for 3×3×3 to 115 voxels for 5×5×5 in typical layouts) (Musah, 3 Aug 2025).

Deep Supervision with Boundary-Aware Loss (EMedNeXt):
- Multi-level outputs supervised by Dice-Focal plus boundary loss at each output level.
- Combined loss: $\gamma_i$ 2, $\gamma_i$ 3.
- Boundary loss leverages channel-wise 3D Sobel operator; opportunity for more precise boundary localization (Jaheen et al., 31 Jul 2025).
Region of Interest (ROI) Enlargement: Enlarged input patches (up to $\gamma_i$ 4) facilitate global context acquisition in presence of large or diffuse pathologies (Jaheen et al., 31 Jul 2025).

5. Quantitative Results and Benchmark Comparisons

MedNeXt-v2 consistently establishes new SOTA in multi-organ and pathological region segmentation:

Pretrained Backbone	Mean DSC	Mean NSD	Reference
nnU-Net (scratch)	80.57	78.49	(Roy et al., 19 Dec 2025)
MedNeXt-v2 (scratch)	82.31	80.34	(Roy et al., 19 Dec 2025)
MedNeXt-v2 Base	82.95	81.06	(Roy et al., 19 Dec 2025)
MedNeXt-v2 Patch×1.5	83.70	81.77	(Roy et al., 19 Dec 2025)
EMedNeXt (SSA Glioma)	0.897*	0.541*	(Jaheen et al., 31 Jul 2025)

*Average Lesion-Wise DSC, NSD (0.5 mm tolerance), hidden SSA validation set.

For breast tumor segmentation in DCE-MRI:

Dice scores: 0.67 (large-kernel ensemble, post-UpKern), up from 0.64 (baseline 3×3×3), NormHD improved to 0.24 (Musah, 3 Aug 2025).

Ablation establishes that backbone validation accuracy predicts pretraining transfer, and context scaling generally outperforms width scaling in mean accuracy (Roy et al., 19 Dec 2025).

6. Downstream Task Integration and Pipeline Extensions

MedNeXt-v2 supports a variety of downstream tasks and specialized pipelines:

Radiomics-Driven pCR Classification: In breast DCE-MRI, 40 radiomic features from segmentations are passed to a self-normalizing all-FC network (SNN: [128, 64, 32] → 1, SELU activations). Yields 57% average balanced accuracy, peaking at 75% in certain subgroups (Musah, 3 Aug 2025).
EMedNeXt Post-Processing: For low-resource MR, employs:
- Sliding-window inference with test-time augmentation.
- Class-specific probability thresholding and connected component pruning.
- Hierarchical enforcement (e.g., enforcing $\gamma_i$ 5).
- Priority-label fusion strategy to ensure robust, interpretable outputs (Jaheen et al., 31 Jul 2025).

7. Relationship to Previous MedNeXt and Research Implications

MedNeXt-v2 extends MedNeXt-v1, which was limited to fixed 3×3×3 kernels and lacked GRN, compound scaling, or explicit context scaling strategies:

Kernel expansion and UpKern yield $\gamma_i$ 6+70% receptive field growth and up to +0.03 Dice gain in challenging tasks (Musah, 3 Aug 2025).
Advanced micro-architecture, especially GRN, and deep supervision have demonstrable impact on transfer learning and boundary precision.
Representation scaling benefits are disproportionately large for pathological (vs. anatomical) segmentation tasks.

Pretraining generalizes across modality boundaries; fine-tuning with complete dataset access eliminates most benefits of modality-specific pretraining (Roy et al., 19 Dec 2025). A plausible implication is that domain-agnostic representations suffice given sufficient fine-tuning, diminishing the need for highly specialized pretraining pipelines.

References

(Roy et al., 19 Dec 2025) MedNeXt-v2: Scaling 3D ConvNeXts for Large-Scale Supervised Representation Learning in Medical Image Segmentation
(Musah, 3 Aug 2025) Large Kernel MedNeXt for Breast Tumor Segmentation and Self-Normalizing Network for pCR Classification in Magnetic Resonance Images
(Jaheen et al., 31 Jul 2025) EMedNeXt: An Enhanced Brain Tumor Segmentation Framework for Sub-Saharan Africa using MedNeXt V2 with Deep Supervision

Markdown Report Issue Upgrade to Chat

References (3)

MedNeXt-v2: Scaling 3D ConvNeXts for Large-Scale Supervised Representation Learning in Medical Image Segmentation (2025)

Large Kernel MedNeXt for Breast Tumor Segmentation and Self-Normalizing Network for pCR Classification in Magnetic Resonance Images (2025)

EMedNeXt: An Enhanced Brain Tumor Segmentation Framework for Sub-Saharan Africa using MedNeXt V2 with Deep Supervision (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to MedNeXt-v2.

MedNeXt-v2: 3D CNN Backbone for Medical Segmentation

1. MedNeXt-v2 Micro-Architecture and Core Components

2. Compound Scaling and Model Variants

3. Training Protocols and Pretraining Regimes

4. Specialized Architectural and Training Innovations

5. Quantitative Results and Benchmark Comparisons

6. Downstream Task Integration and Pipeline Extensions

7. Relationship to Previous MedNeXt and Research Implications

References

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

MedNeXt-v2: 3D CNN Backbone for Medical Segmentation

1. MedNeXt-v2 Micro-Architecture and Core Components

2. Compound Scaling and Model Variants

3. Training Protocols and Pretraining Regimes

4. Specialized Architectural and Training Innovations

5. Quantitative Results and Benchmark Comparisons

6. Downstream Task Integration and Pipeline Extensions

7. Relationship to Previous MedNeXt and Research Implications

References

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research