Papers
Topics
Authors
Recent
2000 character limit reached

Primary Tumor Segmentation in Medical Imaging

Updated 29 December 2025
  • Primary tumor segmentation is defined as the voxel-wise delineation of the main tumor region in medical images, crucial for accurate diagnosis and therapy planning.
  • It employs deep learning architectures, such as 3D U-Net variants with advanced loss functions, to achieve high boundary precision and sensitivity to small lesions.
  • Robust pipelines integrate multimodal data, rigorous preprocessing, and interactive refinement, addressing clinical variability and enhancing segmentation accuracy.

Primary tumor segmentation is a cornerstone task in computational medical image analysis, with direct impact on diagnosis, disease monitoring, and therapy planning across neuro-oncology, abdominal, pulmonary, and breast imaging. The problem entails the delineation of tumor regions—specifically, the region corresponding to the primary disease focus—within volumetric or slice-based medical imaging data such as MRI, CT, PET, or mammography. State-of-the-art solutions converge around deep learning methods, leveraging multimodal information, advanced loss formulations, and ensemble or hybrid architectures to improve robustness, boundary precision, and sensitivity to small or ambiguous lesions.

1. Problem Definition and Task Scope

The primary tumor segmentation task aims to produce voxel-wise (or pixel-wise) label masks that identify the main neoplastic lesion within an image volume, often discriminating tumor subregions (e.g., necrotic core, enhancing tumor) when supported by data. In the context of brain imaging benchmarks such as BraTS, each exam comprises rigidly aligned multimodal 3D MRIs (e.g., T1, T1c, T2, FLAIR, resampled to 1×1×1 mm³); the segmentation output consists of masks for whole tumor (WT), tumor core (TC), and enhancing tumor (ET) (Myronenko et al., 2020). Analogous setups exist for abdominal CT (LiTS: liver tumor), chest CT (LOTUS: lung tumor), and mammography (e.g., MIAS).

Segmentation performance is typically assessed on expert-annotated datasets, with specific test sets, cross-validation folds, and challenge leaderboards providing quantitative comparison across methods and years (Bilic et al., 2019, Gruber et al., 2019, Afshar et al., 2022).

2. Data and Preprocessing Pipelines

Standard datasets for primary tumor segmentation include the BraTS series (glioma; multimodal MRI), LiTS (liver cancer; CT), LOTUS (lung tumor; CT), and mammography datasets (MIAS; (Yousefikamal, 2019)). Data preparation involves a sequence of operations:

  • Spatial harmonization: Resampling to isotropic grids (e.g., 1 mm³).
  • Intensity normalization: Per-channel z-score normalization, HU windowing for CT ([–200,200] HU for liver, [–1000,400] HU for lung), contrast scaling for mammography (Bilic et al., 2019, Afshar et al., 2022, Yousefikamal, 2019).
  • Skull-stripping and rigid alignment: Critical for neuroimaging tasks to ensure cross-modality correspondence (Myronenko et al., 2020).
  • Patch extraction: To fit into limited GPU memory, 3D crops (e.g., 160×192×128 for brain; 128×192×160 for PET/CT) are widely employed during training (Myronenko et al., 2020, Cai et al., 2023).
  • Data augmentation: Random flipping, intensity scaling/shifting, elastic deformations, and affine augmentations are standard. Notably, simple augmentation suffices for robust performance in large, standardized datasets (Myronenko et al., 2020).
  • Label encoding: Multi-class masks for subregion labeling (e.g., glioma core, edema, enhancement) vs. binary masks for organ-specific tasks (Moradi et al., 22 Nov 2024, Pajouh et al., 23 Oct 2025).

3. Network Architectures and Methodological Frameworks

3.1 Canonical Encoder-Decoder Models (3D U-Net and Variants)

The backbone for the majority of state-of-the-art segmentation models is the encoder–decoder (U-Net style) architecture, implemented in 2D, 2.5D, or 3D (Myronenko et al., 2020, Cabezas et al., 2018). Typical features include:

  • Multiple downsampling and upsampling stages, each comprising blocks of 3×3×3 convolutions (3D), normalization (instance or group norm outperforms batch norm for small batch sizes), and identity-residual or dense connections for gradient stability and multi-context extraction (Myronenko et al., 2020, Ahmad et al., 2020, Pajouh et al., 23 Oct 2025).
  • Specialized blocks for multi-context awareness, such as Residual-Inception or Dilated Inception modules, enable integration of local and global features (Ahmad et al., 2020, Cahall et al., 2021).
  • Transformer-inspired modules (e.g., MedNeXt) further enhance representation of variable tumor morphology, particularly in head and neck segmentation (Moradi et al., 22 Nov 2024).

3.2 Loss Functions and Optimization

Modern loss formulations mix region-based and boundary-aware terms to enforce both overlap and shape fidelity:

  • Soft-Dice loss: Ldice=12ipigiipi2+igi2+ϵL_{dice} = 1 - \frac{2 \sum_i p_i g_i}{\sum_i p_i^2 + \sum_i g_i^2 + \epsilon} for each class (Myronenko et al., 2020).
  • Focal loss: Lfocal=1Ni(1pi)γgilog(pi+ϵ)L_{focal} = -\frac{1}{N} \sum_i (1-p_i)^\gamma g_i \log(p_i+\epsilon) (γ=2\gamma=2) to counter class imbalance—especially vital for small or enhancing tumor regions (Myronenko et al., 2020).
  • Active contour loss (ACL): Volume and length penalties to sharpen boundaries and align predicted contours with ground truth (Myronenko et al., 2020).
  • Adversarial/uncertainty regularization: Reciprocal adversarial training with a patch-level critic, and virtual adversarial input noise, has proven to yield smoother and more accurate boundaries (Peiris et al., 2022, Zhang et al., 7 Mar 2025).
  • Hybrid objectives: Summing Dice, focal, and boundary-aware terms typically achieves maximum synergy; equal weighting is often empirically selected (Myronenko et al., 2020).

Training employs Adam or SGD with learning rate scheduling (polynomial or cosine decay), L2 weight decay, and spatial dropout for regularization; convergence is monitored via validation Dice or boundary metric plateaus (Myronenko et al., 2020, Moradi et al., 22 Nov 2024).

3.3 Ensemble and Hybrid Systems

Segmentation ensembles—across architectures or training runs—yield small but consistent boosts in overlap and boundary accuracy, balancing oversegmentation (e.g., dual-decoder U-Nets) and conservative predictions (e.g., plain SegResNets) (Pajouh et al., 23 Oct 2025, Cabezas et al., 2018).

Cascaded pipelines (coarse ROI localization followed by focused subregion refinement) help reduce false positives and enhance performance on hard-to-segment enhancing or necrotic cores (Cabezas et al., 2018).

4. Domain-Specific Extensions and Adaptations

4.1 Multimodal Integration

  • Multi-contrast MRI: Explicit attention mechanisms (task-oriented prompt attention, TPA) and joint prompt learning improve discrimination of subregions that are variably conspicuous across T1, T1c, T2, FLAIR (Zhang et al., 7 Mar 2025).
  • Modality completion: Synthesis of missing MRI contrasts via conditional GANs or 3D U-Nets can recover segmentation fidelity when one or more images are missing—yielding 2–3% Dice gain across tumor regions (Li et al., 2023).

4.2 Interactive and Weakly-Supervised Methods

  • Interactive click-based refinement: Two-stage frameworks allow radiologist correction of automated masks via point prompts, achieving significant Dice improvement after a handful of user interactions (e.g., 0.713 to 0.824 in OPC GTVp, five clicks) (Saukkoriipi et al., 10 Sep 2024). Simulation of errors and click selection is routinely used to augment training.
  • Few-shot and uncertainty collaborative learning: Explicit modeling of inter-contrast feature interaction and Monte Carlo dropout for uncertainty estimation enhance label efficiency, especially under limited annotation (Zhang et al., 7 Mar 2025).

4.3 Non-Deep Learning Approaches

  • Patch- and symmetry-based classical segmentation: For certain anatomies and imaging modalities, feature-based pipelines using discrete wavelet transforms, SVM/Random Forest detection, and contralateral thresholding remain competitive when annotation resources are limited (Gupta et al., 2017).
  • Level set and spatial fuzzy clustering: For mammography, coupling spatial FCM with active contour initialization reduces speckle and manual effort in region growing (Yousefikamal, 2019).

5. Performance Metrics and Benchmark Results

Segmentation quality is primarily measured by:

Representative quantitative results (BraTS/HNTS/LOTUS/LiTS/test/recent):

Task/Anatomy SOTA Dice (primary/WT) Subregion Dice Boundary HD95 (mm) Notes
Brain/BraTS (U-Net + Dice/focal) 0.894 (WT) 0.800 (ET), 0.834 (TC) 2–6.5 (ensemble) 0.826 (ET, test set, ensemble) (Myronenko et al., 2020)
Brain/Adv. Uncertainty TUCL 0.882 (ET, 30% label) N/A 10.85 (ET) Prompt attention + dual-path uncertainty (Zhang et al., 7 Mar 2025)
Head/Neck (MedNeXt-S) 0.8066 (GTVp) 0.7889 (nnUNet) -- MICCAI HNTS, T2 MRI (Moradi et al., 22 Nov 2024)
Lung/LOTUS (best team) 0.59 (test slice-DSC) -- ~0.08 (1/H95) Variability in metric distribution (Afshar et al., 2022)
Liver/LiTS (primary/overall) 0.739 (MICCAI18) Lesion recall 0.554 -- 0.674–0.739 (Dice across events) (Bilic et al., 2019)
Meningioma/Ensemble 0.773 (ET, test) 0.763 (TC), 0.739 (WT) -- 20-epoch trio ensemble on multi-modal MRI (Pajouh et al., 23 Oct 2025)

6. Limitations, Open Challenges, and Best Practices

Limitations and Challenges

  • Small and low-contrast lesions: Deep models (especially 2D/2.5D) often underperform on sub-centimeter or low-contrast tumors; ensemble or 3D context modeling partially mitigates this (Bilic et al., 2019, Afshar et al., 2022).
  • Generalization: Domain shifts (equipment, protocol, anatomy) can degrade performance; normalization and domain adaptation strategies are critical (Li et al., 2023).
  • Annotation variability: Inter-observer differences in mask creation affect achievable accuracy; multi-reader datasets or uncertainty modeling is advised (Afshar et al., 2022).

Best Practices

  • Prefer instance/group normalization when batch size is limited (Myronenko et al., 2020).
  • Increasing channel width in network backbones gives more consistent gains than simply deepening architectures (Myronenko et al., 2020).
  • Simple augmentations (mirroring, scaling, shifting) are sufficient in most high-quality, multimodal datasets (Myronenko et al., 2020).
  • Synergistic combination of region-, sample-, and boundary-aware loss terms is superior to relying on any single loss (Myronenko et al., 2020).
  • For resource-constrained environments, 2D MIP-based or lightweight ensemble models can achieve near-parity with heavy 3D U-Nets at a fraction of compute cost (Zarik et al., 10 Oct 2025, Pajouh et al., 23 Oct 2025).

7. Future Directions

Current trends in primary tumor segmentation include the adoption of advanced attention and prompt-learning modules (e.g., TPA), uncertainty-guided refinement, domain-adaptive or federated training for robust generalization, and integration of multimodal synthetic data to maximize performance under incomplete inputs (Zhang et al., 7 Mar 2025, Li et al., 2023). There is increasing interest in interactive and human-in-the-loop workflows to blend automation with expert correction (2S-ICR, SAM prompt refinement) (Saukkoriipi et al., 10 Sep 2024, Zhang et al., 2023), as well as continual benchmarking on large, diverse, and challenging datasets spanning multiple imaging modalities.

Continued research in loss function design, small lesion sensitivity, multi-organ adaptation, and real-time efficiency will define the next generation of clinically robust primary tumor segmentation systems.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (18)

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to Primary Tumor Segmentation Task.