VIBESegmentator: Threshold & Deep Segmentation
- VIBESegmentator is a collection of image segmentation methods combining a deterministic, threshold-based super-pixel approach for material textures with a deep nnU-Net pipeline for volumetric MRI tissue segmentation.
- The threshold-based method partitions images into regions using global intensity thresholds, connected-component labeling, and optional morphological operations, achieving high pixel accuracy in materials science.
- The deep learning variant leverages a 3D nnU-Net architecture to produce 71 semantic masks with state-of-the-art Dice scores, supporting large-scale medical imaging studies and registration tasks.
VIBESegmentator is a collection of image segmentation methodologies developed for extracting quantitative tissue and super-pixel labels from visual data. Two distinct methodological lineages share this canonical name: (1) a classical thresholding-based connected-component super-pixel approach for vesicular texture analysis in materials science; and (2) a high-capacity, nnU-Net-based deep learning pipeline established for volumetric tissue segmentation in body MRI, optimized for large medical biobanks. Both methodological variants are in wide research use and have distinct algorithmic, representational, and performance profiles.
1. Threshold-Based VIBESegmentator for Vesicular Texture Segmentation
The original VIBESegmentator, introduced for vesicular texture analysis, is a deterministic, unsupervised segmentation method targeting cavity-rich material imagery. Segmentation is approached as partitioning an input image into super-pixels by global intensity thresholding, followed by connected-component labeling, and optional morphological post-processing (Sparavigna, 2016).
Theoretical Workflow
- Preprocessing and Brightness Map Calculation: An input RGB image of dimensions with discrete values , for , is converted to a single-channel grayscale image:
- Optional Global Statistics: Image mean and standard deviation may be computed for guidance in threshold selection:
- Thresholding: Pixels are binarized with threshold :
where (black) marks the object/foreground.
- Connected-Component Labeling: Raster-scan applies 4-neighborhood (above, left) connectivity. For each black pixel, its upper and left neighbors ( and ) determine provisional labels, with equivalence recorded for subsequent union-find flattening.
- Morphological Operations (Optional): Morphological erosion/opening and dilation/closing using a structuring element (disk radius –$3$) can optionally remove speckles or fill gaps pre/post-labeling.
Algorithmic Pseudocode
1 2 3 4 5 6 7 8 9 10 11 12 13 |
Input: RGB_image[Nx][Ny] Output: Label_map[Nx][Ny], Region_areas[ ] 1. For each (i,j): beta[i][j] = (R+G+B)/3 2. (Optional) Gaussian smoothing 3. Choose threshold T For i,j: if beta[i][j] <= T: bin[i][j]=0 else bin[i][j]=1 4. (Optional) MorphOpen/MorphClose 5. Connected-component labeling (row-major order, union-find) 6. Relabeling 7. Region area counting |
Parameterization and Performance
- Typical manual : 100–160; automatic selection: Otsu or entropy-based.
- Gaussian : 0.5–2.
- IoU to expert mask: up to $0.82$; pixel accuracy: up to $0.94$.
- complexity, dependency on global illumination uniformity, and no explicit shape/texture discrimination (Sparavigna, 2016).
Summary Table—Threshold-Based VIBESegmentator
| Step | Parameter Choices / Metrics | Reference Values |
|---|---|---|
| Threshold | 100–160 (manual); Otsu/entropy | (basalt) |
| Smoothing | 0.5–2 pixels | |
| Morphology (disk radius ) | 1–3 pixels | |
| IoU (expert mask) | 0.78–0.82 | |
| Accuracy (pixelwise) | 0.92–0.94 |
2. Deep Learning VIBESegmentator for MRI Tissue Segmentation
The modern VIBESegmentator, variously called "TotalVibeSegmentator," applies a deep 3D nnU-Net for full-body VIBE MRI semantic and instance segmentation (Graf et al., 31 May 2024, Utkueri et al., 2 Dec 2025). The model produces 71 region-wise binary masks per scan, with class-wise output for organ, vessel, muscle, and fat compartments.
Network Architecture and Data Pipeline
- Backbone:
nnU-Net 3D U-Net (Isensee et al., 2021), five encoding/decoding stages, with feature expansion/contraction as 32 64 128 256 512 and mirrored on the up path.
- Patch Size:
voxels; 1.4 mm in-plane, 3 mm through-plane.
- Channels:
Single super-channel from water/in-phase/out-of-phase contrasts (merged for training). Auxiliary: 11-quadrant spatial body localization mask downsampled to 4 mm.
- Preprocessing:
Stacking axial MRI stacks; resample to standard grid; per-volume z-score normalization; label-dependent connected-component filtering (Graf et al., 31 May 2024).
- Augmentation:
Elastic deformation, rotation, scaling, and intensity jitter (nnU-Net default), disabling random flips.
Loss Function and Optimization
- Composite Loss:
with ; , ; , .
- Optimizer:
SGD with Nesterov momentum $0.99$; weight decay ; "poly" learning rate schedule to 1000 epochs, batch size 2 (Graf et al., 31 May 2024).
Training and Iterative Bootstrapping
Initial label bootstrapping leveraged CT-to-MRI label transfer and body composition segmentation (TotalSegmentator, Jung et al. nets, SPINEPS for vertebrae), with iterative manual correction by radiologists. The training set expanded from 4 to eventually 93 subjects, with augmentation and retraining to convergence.
Hold-out testing used 12 fully manually-corrected cases; abdomen-only comparison involved 1000 additional NAKO cases.
3. Semantic Structure Output and Label Organization
TotalVibeSegmentator produces 71 semantic region masks per subject, including but not limited to: spleen, kidneys, gallbladder, liver, stomach, pancreas, lungs, esophagus, trachea, thyroid, major vessels, heart, bone, multiple muscle groups, fat compartments (subcutaneous, inner), and spine. An additional vertebral instance model assigns 22 distinct vertebra classes (C3–L5) (Graf et al., 31 May 2024).
Example outputs include binary masks for each class and per-region statistical volumes. Subcutaneous adipose tissue (SAT) and "muscle (other)" are emphasized for downstream tasks such as registration (Utkueri et al., 2 Dec 2025).
4. Quantitative Evaluation and Comparative Analysis
Performance metrics for the 3D nnU-Net-based VIBESegmentator are state-of-the-art:
- Mean Dice (12 test subjects, all 71 classes): .
- Per-class Dice:
Abdominal organs except for pancreas ($0.83$), thyroid ($0.75$), gallbladder ($0.72$).
- Agreement Dice (1,000 abdominal cases):
Liver $0.93$, right kidney $0.93$, spleen $0.91$, pancreas $0.70$.
- Surface distance:
Median 1 mm, class-wise.
In the context of whole-body MRI registration, the use of SAT and muscle masks as additional registration channels improves cohort mean Dice by 6 percentage points over intensity-only methods ($0.77$ vs $0.71$ male; $0.75$ vs $0.69$ female for 71 masks) (Utkueri et al., 2 Dec 2025). Additional gains are realized compared to prior deep learning (uniGradICON, +9pp/+8pp) and traditional B-spline (MIRTK, +12pp/+13pp) baselines.
5. Integration into Multi-Channel Registration Pipelines
The VIBESegmentator-generated SAT and muscle masks, combined with FF and WF channels, are ingested by a graph-cut blockwise optimizer (Deform v0.5.2), using a cost function:
with , , . Optimization is performed via six-level resolution and blockwise multi-label graph-cut, trilinear/mask interpolation (Utkueri et al., 2 Dec 2025).
Table—Segmentation/Registration Performance Summary
| Method | Test Cohort Size | Mean Dice (All Masks) | Notable Comparison |
|---|---|---|---|
| VIBESegmentator (3D) | 12 | $0.89$ | most organs |
| Registration (SAT+muscle mask) | 4000 | $0.77$ (male); $0.75$ (female) | pp vs intensity-only |
6. Limitations, Potential Extensions, and Applications
Identified Limitations
- Threshold-based: sensitive to illumination gradients, no use of shape/texture, susceptible to noise; only supports fixed global thresholding (Sparavigna, 2016).
- nnU-Net-based: Relies on segmentation mask quality, especially for SAT and muscle registration; limited by single-reference atlas bias; increasing mask channels in registration is memory-scaled (Utkueri et al., 2 Dec 2025).
Potential Extensions
- Adaptive/local thresholding (e.g., Sauvola, Wolf) or multi-level Otsu for vesicular textures.
- Gradient-based edge refinement or texture-classification as post-processing layers.
- Use of synthetic or multi-atlas references to reduce bias in registration.
- Expansion to additional organs or tissue types by appending channels and retraining.
- Validation with further non-European populations and pediatric scans.
Existing and Emerging Applications
- Quantitative vesicle morphology analysis in earth sciences.
- Large-scale epidemiological body composition studies via automated MRI segmentation in NAKO and UK Biobank (Graf et al., 31 May 2024).
- Improved spatial normalization and anatomical correspondence for statistical phenotyping and age-correlation studies in population MRI (Utkueri et al., 2 Dec 2025).
7. Related Software Resources and Data Availability
Trained model weights, inference code, and pipeline specifications for the full-body 3D MRI version are publicly available at https://github.com/robert-graf/TotalVibeSegmentator. Open-source implementations enable reproducibility, adaptation to new datasets, and extension to variant downstream pipelines.
VIBESegmentator, spanning both threshold and nnU-Net paradigms, provides reference architectures for super-pixel and semantic instance segmentation in disparate imaging domains. The methods are directly responsible for robust, large-scale, and quantifiable organ and tissue delineation in both materials analysis and medical imaging contexts (Sparavigna, 2016, Graf et al., 31 May 2024, Utkueri et al., 2 Dec 2025).