Shape-Based Segmentation of 3D Vertebrae

Updated 30 January 2026

The paper introduces shape-based segmentation approaches leveraging explicit geometric priors to enhance 3D vertebrae segmentation accuracy in challenging medical images.
It employs a range of techniques—from deformable models and region growing to graph-based optimization and low-rank contour descriptors—to overcome issues such as inter-vertebra similarity and low contrast conditions.
Deep neural architectures with integrated shape prior modules further refine segmentation, achieving high Dice coefficients and producing anatomically consistent vertebral masks.

Shape-based segmentation of 3D vertebrae involves the extraction of anatomically precise, contiguous vertebral masks from volumetric medical images. Unlike purely intensity- or patch-based methods, shape-based approaches impose explicit geometric or structural priors—ranging from deformable models and graph-based templates to low-rank contour descriptors and deep-learning shape prior modules—to constrain predictions and eliminate label inconsistencies, especially in regions of low contrast or pathological deformation. These methods address core challenges in vertebra segmentation: similar appearance of adjacent vertebrae, pathological alterations, intra-vertebrae label ambiguity, and the need for robust anatomical regularization across imaging modalities.

1. Motivation and Challenges in Vertebral Segmentation

Precise 3D vertebra segmentation is crucial for clinical diagnostics, surgical planning, and biomechanical modeling. Algorithms must contend with three dominant challenges (You et al., 2024):

Inter-vertebrae similarity: Adjacent vertebrae exhibit highly similar grayscale intensities and local textures on CT and MR scans, causing voxel-wise classifiers to misassign labels and create cross-boundary label bleed.
Pathological and artifact-induced distortions: Fractures, degeneration, scoliosis, and metal implants disrupt local intensity and shape cues, complicating bounding box or mask placement.
Intra-vertebrae segmentation inconsistency: Standard segmentation networks may produce fragmented, multi-label, or “holey” masks within a single vertebral body, violating the anatomical prior of a contiguous, single-label structure.

Shape-based methods address these by leveraging topological, morphological, or learned priors that regularize segmentation outputs even in difficult, artifact-laden regions.

2. Deformable Models and Morphological Operations

Early shape-based vertebra segmentation frameworks utilized 3D deformable models and region growing to encapsulate the vertebral body (Mastmeyer et al., 2017, Mastmeyer et al., 2017). A prototypical workflow entails:

Initialization: Placement of a spherical or ellipsoidal triangular mesh (balloon) within a search region based on user-defined or automated vertebral centers and spine canal extraction.
Balloon evolution: Vertices move via force-balance equations,

$m\,\ddot p_i + \gamma\,\dot p_i = f_{\rm image}(p_i) + f_{\rm smooth}(p_i) + f_{\rm shape}(p_i)$

with internal spring-like forces promoting mesh regularity and external image-based forces drawing the mesh to periosteal boundaries.

Mesh adaptation: Local refinement via vertex insertion or edge splitting captures sharp anatomical curvatures.
Seeded volume growing and morphology: High-intensity surface voxels initiate 3D region growing. Morphological closing and hole-filling produce solid masks, while iterative erosions/dilations separate the vertebral body from pedicles and processes.
Trabecular/cortical compartmentalization: Further erosions/thresholding partition bone into anatomical subregions.

These pipelines yield sub-1% coefficient-of-variation errors for BMD and sub-2% for volume, demonstrating strong intra-operator repeatability.

3. Graph-Based and Template-Driven Segmentation

Template priors can be enforced via graph-based optimization as in Cube-Cut (Schwarzenberg et al., 2014). This algorithm constructs an s-t graph representing a cubic divergence of rays and layers sampled around a central seed point in MRI scans. Energy minimization is governed by:

$E(L) = \sum_{p\in P'} D_p(t_p) + \lambda\sum_{(p,q)\in\mathcal N} V_{p,q}(t_p, t_q)$

where $D_p$ is a data term based on local intensity, and $V_{p,q}$ enforces cubic smoothness through infinite-weight edges. The flexibility parameter $\Delta$ allows deviation from a perfect cube to accommodate anatomical variation. Cube-Cut achieves ≈81.3% Dice similarity coefficient in under one minute, requiring only single-point initialization and permitting controlled shape prior enforcement.

4. Low-Rank, Descriptor-Based Shape Regularization

Contemporary methods impose global shape consistency through data-driven low-rank contour descriptors. SLoRD (You et al., 2024) exemplifies this paradigm with the following procedure:

Spherical coordinate sampling: Each vertebral contour is parametrized as a dense vector of radial distances $\bm\rho$ from estimated central point $(x_s, y_s, z_s)$ , over a grid of angles $(\theta, \phi)$ .
Training-time SVD basis construction: Ground-truth contours yield a matrix $\mathcal M$ , decomposed via SVD,

$\mathcal M = U\,\Sigma\,V^\top.$

The top $E(L) = \sum_{p\in P'} D_p(t_p) + \lambda\sum_{(p,q)\in\mathcal N} V_{p,q}(t_p, t_q)$ 0 singular vectors $E(L) = \sum_{p\in P'} D_p(t_p) + \lambda\sum_{(p,q)\in\mathcal N} V_{p,q}(t_p, t_q)$ 1 capture dominant anatomical modes.

Inference subspace regression: The network regresses a vector $E(L) = \sum_{p\in P'} D_p(t_p) + \lambda\sum_{(p,q)\in\mathcal N} V_{p,q}(t_p, t_q)$ 2. The reconstructed descriptor $E(L) = \sum_{p\in P'} D_p(t_p) + \lambda\sum_{(p,q)\in\mathcal N} V_{p,q}(t_p, t_q)$ 3 enforces that the predicted contour lies in the global anatomical subspace.
Losses: The objective combines centroid regression, boundary adherence, and conventional Dice/Cross-Entropy regularization.

SLoRD can refine the output of any upstream segmentation network, systematically eliminating label bleed and enforcing a single, contiguous mask per vertebra. On VerSe 2019, it delivers an average Dice ≈90.9% and a Hausdorff distance ≈6.1 mm, improving over single-stage and multi-stage baselines.

5. Deep Neural Architectures and Shape Priors

Modern deep learning pipelines fuse explicit shape priors with powerful semantic extractors. SpineMamba (Zhang et al., 2024) couples residual Visual Mamba blocks—hybrid SSM–CNN layers for global and local context modeling—with a vertebral shape prior (VSP) module:

Visual Mamba block: Implements linear and convolutional branches, with state-space convolution for long-range dependency modeling. Residual merges and normalization stabilize deep training.
Shape prior injection: A learnable tensor $E(L) = \sum_{p\in P'} D_p(t_p) + \lambda\sum_{(p,q)\in\mathcal N} V_{p,q}(t_p, t_q)$ 4 serves as both global and local anatomical guidance, refined by additional VSS blocks and fused at feature level in decoder skip connections.
Combined loss: Dice plus Cross-Entropy, optionally augmented by mask-level prior penalties.
Performance: On CTSpine 1K (CT), SpineMamba achieves an average Dice similarity coefficient of 94.4%, outperforming nnU-Net by up to 2 percentage points. On MRSpineSeg (MR), the Dice reaches 86.95%.

This architecture embodies the trend toward anatomically regularized, multi-resolution, end-to-end learnable frameworks for robust vertebral segmentation.

6. Skeletonization, Region Decomposition, and Landmark Extraction

Anatomically detailed segmentation pipelines, as in SLD (Blomenkamp et al., 23 Jan 2026), perform mesh skeletonization and Potts-model graph-cut labeling to extract specific vertebral subregions:

Anatomical decomposition: Vertebral bodies, arch subregions (lamina, spinous/transverse processes, articular facets).
Region skeletons: 1D curves via 3D thinning/Laplacian skeletonization serve as priors for each subregion.
Probabilistic tubular priors: Vertex likelihoods decay radially from region skeletons.
Energy minimization: Graph-cut labeling in a multi-label Potts model, reinforced with smoothness weights derived from mesh adjacency.
Accuracy: SLD achieves mean Dice coefficients ≈0.93 overall, MSD ≈0.5 mm, with marked improvement vs. simple intensity-based thresholding.

This strategy supports not only precise segmentation but also robust anatomical landmark detection for biomechanical modeling.

7. Limitations and Future Directions

Existing shape-based segmentation frameworks are characterized by domain-specific design choices:

Coverage limitation: Some methods focus on vertebral bodies, omitting processes and discs (You et al., 2024).
Template rigidity: Graph-based cubic priors require careful placement and parameterization; complex deformities may elude such constraints (Schwarzenberg et al., 2014).
Reliance on initializations: Classical deformable models mandate manual center input; automated centering and full statistical atlases remain avenues for further improvement (Mastmeyer et al., 2017).
Patch-based dependency: Refinement approaches such as SLoRD rely on sufficiently accurate initial coarse masks and centroids (You et al., 2024).
Generalization across pathologies: While hybrid neural and shape-prior methods demonstrate robustness, further validation on diverse, multi-center, and highly pathological datasets is necessary (Zhang et al., 2024).

Expansion to unsupervised or semi-supervised learning regimes, adaptive region priors, more flexible subregion modeling, and joint segmentation-reconstruction approaches (e.g., via integration with shape-completion networks (Massalimova et al., 2024)) are plausible directions for future research.

Shape-based segmentation of 3D vertebrae spans a continuum from classic explicit deformable models through graph-based shape templates to advanced low-rank anatomical priors and deep neural architectures. Each method offers distinctive advantages in anatomical regularization, label consistency, and robustness to imaging artifacts, advancing quantitative spinal analysis and clinical practice.