MedNeXt-v2: 3D CNN Backbone for Medical Segmentation
- MedNeXt-v2 is a 3D convolutional neural network backbone designed for volumetric medical image segmentation that extends ConvNeXt principles.
- It incorporates advanced normalization, compound scaling, flexible kernel adaptation (UpKern), and deep supervision to achieve state-of-the-art performance across diverse modalities and anatomical regions.
- Its robust pretraining on large-scale CT volumes combined with tailored fine-tuning strategies results in enhanced segmentation accuracy for organs, tumors, and lesions.
MedNeXt-v2 is a 3D convolutional neural network backbone that extends ConvNeXt-style design principles for application in volumetric medical image segmentation and downstream medical tasks. It incorporates advanced normalization, compound scaling, flexible kernel adaptation strategies, and strong supervised pretraining to achieve state-of-the-art (SOTA) performance across diverse modalities, anatomical regions, and clinical segmentation challenges. MedNeXt-v2 is widely used in both generalist large-scale representation learning workflows and task-specific architectures for organ, tumor, and lesion segmentation in computed tomography (CT) and magnetic resonance (MR) imaging (Roy et al., 19 Dec 2025, Musah, 3 Aug 2025, Jaheen et al., 31 Jul 2025).
1. MedNeXt-v2 Micro-Architecture and Core Components
MedNeXt-v2 utilizes a five-level 3D U-Net macro-architecture with ConvNeXt-style residual blocks as both encoder and decoder units. The essential micro-architectural features are:
- 3D ConvNeXt Block: Each block comprises a 3×3×3 depthwise convolution, instance normalization, 1×1×1 pointwise channel expansion (e.g., to 7× width), GELU activation, a 3D Global Response Normalization (GRN) module, a 1×1×1 pointwise projection, and residual skip connection.
- 3D Global Response Normalization (GRN): For feature tensor , GRN is defined:
with learnable parameters , per channel, stabilizing activation norms over large expansion ratios. This mechanism prevents activation collapse and over-dominance of individual channels (Roy et al., 19 Dec 2025).
- Deep Supervision: Auxiliary 1×1×1 convolutional outputs at intermediate decoder levels with corresponding decoder-stage ground truth supervision.
- Flexible Downsampling/Upsampling: Strided depthwise or transposed-depthwise convolutions in a residual formulation enable efficient multi-scale feature learning.
This core design is shared across MedNeXt-v2 applications in both general supervised pretraining (Roy et al., 19 Dec 2025) and domain-adapted task-specific variants (Musah, 3 Aug 2025, Jaheen et al., 31 Jul 2025).
2. Compound Scaling and Model Variants
MedNeXt-v2 employs compound scaling, inspired by EfficientNet, with simultaneous scaling of depth, width (channels), and input context (patch size):
- For depth , width , and input size , parameterized by :
where , 0, 1 are baseline, and 2 control scaling per dimension.
- Canonical Variants:
- Base: 52 layers, default channels, 3 patch size.
- Patch×1.5: 52 layers, default channels, 4 patch size (increased context).
- Width×2.0: 52 layers, double channel width, 5 patch size.
Performance saturates before pure width scaling (e.g., 6M parameters in Width×2.0 do not surpass 7M in Patch×1.5 for mean segmentation accuracy), indicating that augmenting context and architecture yields greater returns than naïve parameter scaling (Roy et al., 19 Dec 2025).
3. Training Protocols and Pretraining Regimes
Large-scale supervised pretraining is central to MedNeXt-v2’s utility:
- Pretraining: Conducted on 18,000 CT volumes from the CADS collection (44 anatomical labels), using z-score normalization, isotropic resampling, and extensive augmentations.
- Optimization: AdamW, batch size 8 (2 per GPU), 1500 epochs, distributed training with nnU-Net augmentation/loss protocols.
- Fine-tuning: Warmup for 50 epochs, then 250 further epochs on specific tasks/datasets, patch size variant-dependent.
- Task Coverage: Demonstrated on six CT/MR benchmarks representing 144 structures: pediatric organs, knee MR, CBCT dental classes, brain metastases, pancreatic tumor, and vertebrae (Roy et al., 19 Dec 2025).
Task-specific adaptations (e.g., EMedNeXt) use similar pretraining, followed by fine-tuning and parameter unfreezing strategies to accommodate low-resource settings or domain shifts (e.g., SSA low-field MRIs), sometimes freezing encoder and only unfreezing late decoder blocks (Jaheen et al., 31 Jul 2025).
4. Specialized Architectural and Training Innovations
Several key innovations distinguish MedNeXt-v2 and its derivatives:
- UpKern Algorithm: Allows kernel size expansion (e.g., 8) without random reinitialization, using trilinear interpolation:
9
where 0.
- Receptive Field Analysis: Formalized as
1
Kernel upgrade increases effective receptive field by up to 70% (from 67 voxels for 3×3×3 to 115 voxels for 5×5×5 in typical layouts) (Musah, 3 Aug 2025).
- Deep Supervision with Boundary-Aware Loss (EMedNeXt):
- Multi-level outputs supervised by Dice-Focal plus boundary loss at each output level.
- Combined loss: 2, 3.
- Boundary loss leverages channel-wise 3D Sobel operator; opportunity for more precise boundary localization (Jaheen et al., 31 Jul 2025).
- Region of Interest (ROI) Enlargement: Enlarged input patches (up to 4) facilitate global context acquisition in presence of large or diffuse pathologies (Jaheen et al., 31 Jul 2025).
5. Quantitative Results and Benchmark Comparisons
MedNeXt-v2 consistently establishes new SOTA in multi-organ and pathological region segmentation:
| Pretrained Backbone | Mean DSC | Mean NSD | Reference |
|---|---|---|---|
| nnU-Net (scratch) | 80.57 | 78.49 | (Roy et al., 19 Dec 2025) |
| MedNeXt-v2 (scratch) | 82.31 | 80.34 | (Roy et al., 19 Dec 2025) |
| MedNeXt-v2 Base | 82.95 | 81.06 | (Roy et al., 19 Dec 2025) |
| MedNeXt-v2 Patch×1.5 | 83.70 | 81.77 | (Roy et al., 19 Dec 2025) |
| EMedNeXt (SSA Glioma) | 0.897* | 0.541* | (Jaheen et al., 31 Jul 2025) |
*Average Lesion-Wise DSC, NSD (0.5 mm tolerance), hidden SSA validation set.
For breast tumor segmentation in DCE-MRI:
- Dice scores: 0.67 (large-kernel ensemble, post-UpKern), up from 0.64 (baseline 3×3×3), NormHD improved to 0.24 (Musah, 3 Aug 2025).
Ablation establishes that backbone validation accuracy predicts pretraining transfer, and context scaling generally outperforms width scaling in mean accuracy (Roy et al., 19 Dec 2025).
6. Downstream Task Integration and Pipeline Extensions
MedNeXt-v2 supports a variety of downstream tasks and specialized pipelines:
- Radiomics-Driven pCR Classification: In breast DCE-MRI, 40 radiomic features from segmentations are passed to a self-normalizing all-FC network (SNN: [128, 64, 32] → 1, SELU activations). Yields 57% average balanced accuracy, peaking at 75% in certain subgroups (Musah, 3 Aug 2025).
- EMedNeXt Post-Processing: For low-resource MR, employs:
- Sliding-window inference with test-time augmentation.
- Class-specific probability thresholding and connected component pruning.
- Hierarchical enforcement (e.g., enforcing 5).
- Priority-label fusion strategy to ensure robust, interpretable outputs (Jaheen et al., 31 Jul 2025).
7. Relationship to Previous MedNeXt and Research Implications
MedNeXt-v2 extends MedNeXt-v1, which was limited to fixed 3×3×3 kernels and lacked GRN, compound scaling, or explicit context scaling strategies:
- Kernel expansion and UpKern yield 6+70% receptive field growth and up to +0.03 Dice gain in challenging tasks (Musah, 3 Aug 2025).
- Advanced micro-architecture, especially GRN, and deep supervision have demonstrable impact on transfer learning and boundary precision.
- Representation scaling benefits are disproportionately large for pathological (vs. anatomical) segmentation tasks.
Pretraining generalizes across modality boundaries; fine-tuning with complete dataset access eliminates most benefits of modality-specific pretraining (Roy et al., 19 Dec 2025). A plausible implication is that domain-agnostic representations suffice given sufficient fine-tuning, diminishing the need for highly specialized pretraining pipelines.
References
- (Roy et al., 19 Dec 2025) MedNeXt-v2: Scaling 3D ConvNeXts for Large-Scale Supervised Representation Learning in Medical Image Segmentation
- (Musah, 3 Aug 2025) Large Kernel MedNeXt for Breast Tumor Segmentation and Self-Normalizing Network for pCR Classification in Magnetic Resonance Images
- (Jaheen et al., 31 Jul 2025) EMedNeXt: An Enhanced Brain Tumor Segmentation Framework for Sub-Saharan Africa using MedNeXt V2 with Deep Supervision