TotalSegmentator: Deep Learning Segmentation

Updated 26 May 2026

TotalSegmentator is an open-source segmentation toolkit that automatically delineates up to 104 anatomical structures in CT and MRI scans, providing reproducible results across modalities.
It employs a cascaded 3D nnU-Net architecture with low-resolution atlas prediction, high-resolution patch refinement, and fusion of overlapping predictions to enhance segmentation precision.
Key applications include quantitative imaging biomarker extraction, surgical planning, and radiomics, with high Dice scores and robust generalizability validated on public datasets.

TotalSegmentator is an open-source, deep learning–based segmentation toolkit providing multi-structure, fully automatic delineation of anatomical entities in cross-sectional CT and MRI images. Originating from the work of Wasserthal et al. and subsequent extensions, it enables precise, reproducible segmentation across up to 104 anatomical structures, driving downstream applications from quantitative imaging biomarkers to surgical planning and large-scale population studies (Wasserthal et al., 2022, D'Antonoli et al., 2024). The framework leverages the nnU-Net architecture, employing data-driven configuration and robust generalizability, with thoroughly validated public datasets and reproducible pipelines.

1. Model Architecture and Framework

TotalSegmentator utilizes a cascaded 3D nnU-Net architecture with an ensemble of U-Net–style convolutional neural networks. The core pipeline comprises three principal stages (Finocchiaro et al., 28 Feb 2025, Wasserthal et al., 2022):

Low-resolution atlas stage: An initial whole-volume network predicts coarse anatomical masks.
High-resolution patch refinement: Organ-centered, high-resolution networks further polish regional boundaries.
Fusion and reconciliation: Overlapping predictions are consolidated, e.g., resolving boundaries between thoracic, abdominal, and pelvic groups.

All models are trained using a sum of soft Dice loss and cross-entropy loss. For MRI, architectural modifications include disabling mirror augmentation and using zero-mean, unit-variance normalization suitable for sequence variability. For resource management, classes are split into groups (e.g., two 28-class models for MRI; five groups for CT at 1.5 mm) (D'Antonoli et al., 2024, Wasserthal et al., 2022). Training strategies typically exclude 5-fold ensembling in default deployment, favoring runtime efficiency.

2. Datasets, Scope, and Label Taxonomy

The initial CT model was trained on 1,204 curated CT studies spanning 8 clinical sites and 16 scanner models (Siemens-dominated) sampled from routine practice across years 2012–2020 (Wasserthal et al., 2022). MRI extensions introduce >300 clinical MR scans across multiple sites, vendors, and sequence types, supplemented with a representative CT subset for shape priors, forming a combined MR+CT training set (D'Antonoli et al., 2024).

The label taxonomy encompasses:

CT: 104 targets—27 organs (e.g., liver, spleen, pancreas), 59 bones, 10 muscles, 8 vessels.
MRI: 80 targets—major abdominal/pelvic/soft tissue structures, harmonizing with clinical use cases (organ volumetry, cross-modality research). Segmentations are stored as multi-class NIfTI volumes and per-structure binary masks. Public, curated ground-truth datasets support both modeling and benchmarking.

3. Quantitative Performance and Benchmarks

Performance is reported using standard overlap and boundary-based metrics:

Dice Similarity Coefficient: $\text{Dice}(A, B) = \frac{2|A \cap B|}{|A| + |B|}$
Normalized Surface Distance (NSD): Fraction of surface points within a 3 mm margin.
Average Symmetric Surface Distance (ASSD), 95th Percentile Hausdorff Distance (HD₉₅): For fine boundary analysis (particularly colon/lung) (Finocchiaro et al., 28 Feb 2025, Lee et al., 18 Sep 2025).

Key results include:

CT, 1.5 mm model: Mean Dice = 0.943 [0.938, 0.947], NSD = 0.966 [0.962, 0.971] over 65-test set; 0.932 compared to BTCV challenge (baseline nnU-Net: 0.871, p<0.001) (Wasserthal et al., 2022).
MRI model: Dice = 0.824 [0.801–0.842] on internal abdominal test set; NSD = 0.882 [0.860–0.900]. For an extended validation: CT Dice = 0.96, NSD = 0.994 (D'Antonoli et al., 2024).
Application-specific endpoints:
- Colon segmentation (CT): TotalSegmentator shows ASSD ≈ 4.03 mm, HD₉₅ ≈ 17.97 mm (air-filled lumen); HQColon improves these to 0.12 mm and 1.0 mm (Finocchiaro et al., 28 Feb 2025).
- Lung: Mean Dice 0.97 (mild) and 0.94 (moderate-severe) with pronounced drop-off under high pathology; bested by Unet-R231, but substantially outperforms MedSAM (Lee et al., 18 Sep 2025).
- Muscle & Fat (body composition): Subcutaneous fat Dice ≈ 0.81, muscle ≈ 0.83; Cohen’s Kappa ≈ 0.86 for visceral fat vs. internal specialized tool (Hou et al., 2024).

4. Applications and Integration in Research Workflows

TotalSegmentator provides multi-structure masks for:

Organ volumetry and attenuation analysis, including disease biomarker studies (e.g., CT-IDP framework extracts 900+ phenotype descriptors per scan) (Dahal et al., 9 May 2026).
Surgical and radiation planning: Detailed, reproducible anatomy boundaries and STL mesh export for 3D modeling (D'Antonoli et al., 2024).
Body composition: Automated muscle, fat, and compartment quantification at population scale (Hou et al., 2024).
Radiomics, opportunistic screening, and cross-sectional imaging studies across CT/MRI.

The pipeline is fully containerized and distributed as a Python CLI (totalsegmentator), with direct integration in DICOM/NIfTI workflows and interfaces for PACS/RIS and batch deployment.

5. Generalization, Robustness, and Sequence Independence

TotalSegmentator demonstrates strong generalizability by design:

Sequence robustness: MRI models perform equivalently across T1-, T2-, and proton-density–weighted exams, isotropic and anisotropic resolutions, and variable scanner-source images, with only extreme sequences degrading performance (D'Antonoli et al., 2024).
Inter-institutional transfer: Models validated across internal and public data (AMOS, CHAOS) and diverse patient populations.
Loss of performance is documented in high-complexity contexts—severe lung disease, high-frequency tubular boundaries (colon), and extremely small/low-contrast features (e.g., <2 cc cysts, fine bowel loops) (Finocchiaro et al., 28 Feb 2025, Lee et al., 18 Sep 2025, Wasserthal et al., 2022).

Ablation studies support the added benefit of cross-modality training (CT+MR), though highly specialized, organ-centric models (e.g., Unet-R231 for lung, HQColon for colon) may outperform TotalSegmentator when targeting the most challenging detail.

6. Limitations, Failure Modes, and Comparative Assessments

Primary limitations arise from the cascade design and label granularity:

Coarse boundary delineation: Loss of sub-organ details such as haustral folds or fluid pockets in the colon, under-segmentation of peripheral lung in severe pathology (Finocchiaro et al., 28 Feb 2025, Lee et al., 18 Sep 2025).
Segmentation drop-out: Missed segments in convoluted or collapsed anatomy, especially in colon and small vessels (Wasserthal et al., 2022).
Compartment ambiguity: Single-class colon masks cannot distinguish fluid-filled from air-filled lumen (Finocchiaro et al., 28 Feb 2025).
Quantitative phenotype fidelity: Downstream feature computation (e.g., in CT-IDP) is limited by the accuracy of upstream segmentations; small or low-contrast lesions may be missed by model and thus unrepresented in downstream phenotypic descriptors (Dahal et al., 9 May 2026).

Comparative studies indicate clear utility as a general-purpose solution, but task-optimized models (HQColon, Unet-R231, internal fat/muscle pipelines) can outperform TotalSegmentator in their respective domains (Finocchiaro et al., 28 Feb 2025, Hou et al., 2024, Lee et al., 18 Sep 2025).

7. Software Distribution, Public Resources, and Community Adoption

TotalSegmentator is distributed as an open-source Python package (pip installable), with all code, pretrained models, and annotation datasets available on GitHub and Zenodo (Wasserthal et al., 2022, D'Antonoli et al., 2024). Both CT and MRI versions are supported, with public web-based demo endpoints. Output includes multi-label, per-structure NIfTI masks, enabling seamless integration with research and clinical imaging platforms. Memory and runtime profiles (~1–3 min per scan, <12 GB RAM, optional GPU acceleration) support deployment in both research and clinical environments.

The toolkit is widely adopted in large-scale imaging studies and as a foundation for quantitative imaging research, with extensive use in independent model benchmarking and as a reference method in task-specific comparative works (Hou et al., 2024, Finocchiaro et al., 28 Feb 2025, Dahal et al., 9 May 2026).

References:

"TotalSegmentator: robust segmentation of 104 anatomical structures in CT images" (Wasserthal et al., 2022)
"TotalSegmentator MRI: Robust Sequence-independent Segmentation of Multiple Anatomic Structures in MRI" (D'Antonoli et al., 2024)
"HQColon: A Hybrid Interactive Machine Learning Pipeline for High Quality Colon Labeling and Segmentation" (Finocchiaro et al., 28 Feb 2025)
"Enhanced Muscle and Fat Segmentation for CT-Based Body Composition Analysis: A Comparative Study" (Hou et al., 2024)
"Transplant-Ready? Evaluating AI Lung Segmentation Models in Candidates with Severe Lung Disease" (Lee et al., 18 Sep 2025)
"CT-IDP: Segmentation-Derived Quantitative Phenotypes for Interpretable Abdominal CT Disease Classification" (Dahal et al., 9 May 2026)