Papers
Topics
Authors
Recent
2000 character limit reached

Geometric Deep Skeletonization

Updated 1 February 2026
  • Geometric deep skeletonization is a method that extracts a shape’s medial axis using deep learning combined with classical geometric concepts.
  • Techniques include flux-based regression, multi-scale fusion, point-cloud methods, and iterative thinning to ensure topology preservation and accurate scale encoding.
  • These approaches enhance applications in 2D/3D image analysis, anatomical modeling, and CAD by achieving improved speed, precision, and structural fidelity.

Geometric deep skeletonization refers to a class of methods that extract topological medial representations (skeletons) from images or volumes via deep neural architectures, with explicit geometric reasoning. These models seek to combine the benefits of classical skeletonization (medial axis, scale encoding, topology preservation) with the robustness and learning capacity of modern deep networks. Techniques span supervised regression of geometric fields, multi-task fusion of scale-specific side outputs, explicit skeleton sheet fitting, deep iterative thinning, end-to-end differentiable modules, and generative skeleton estimation. The central goal is to produce structurally minimal, geometrically faithful, and topologically correct skeletons applicable to diverse modalities from natural images and medical scans to point clouds and 3D meshes.

1. Mathematical Foundations and Geometric Representations

Geometric skeletonization is grounded in the medial axis transform (MAT), defined for a shape ΩRd\Omega\subset\mathbb{R}^d as the locus of points with multiple equidistant closest points on the boundary Ω\partial\Omega, equivalently the centers of maximally inscribed balls. For 2D objects, this translates to the set:

S={xΩr>0 s.t. B(x,r)Ω and B(x,r)⊈Ω, r>r}S = \{ x\in\Omega \mid \exists r>0 \text{ s.t. } B(x,r)\subseteq\Omega\text{ and } B(x,r')\not\subseteq\Omega,\ \forall r'>r\}

and per-pixel object thickness s(x)s(x) is twice the radius of the maximal ball at xx (Shen et al., 2016, Shen et al., 2016).

The geometric character of skeletonization differs from edge detection or segmentation: it encodes both topology (branches, loops) and object part scale (width/thickness). Some methods regress a vector field mapping each pixel to its candidate skeleton location (“context flux”), while others construct discrete point clouds or graphs decomposing skeletons into “curve-like” or “sheet-like” structures (Tang et al., 2019, Khargonkar et al., 2023).

2. Deep Learning Architectures for Skeleton Extraction

Several deep architectures operationalize geometric skeletonization principles:

Flux-based regression. DeepFlux predicts a dense 2D flux vector field F(p)F(p) for every pixel, indicating direction to the nearest skeleton pixel within a context region RcR_c dilated around ground-truth skeleton RsR_s:

F(p)={NppNpp,pRc (0,0),pRs(ΩDr(Rs))F(p) = \begin{cases} \frac{N_p - p}{\|N_p - p\|}, & p\in R_c \ (0,0), & p\in R_s \cup (\Omega \setminus D_r(R_s)) \end{cases}

with NpN_p the nearest skeleton pixel. A VGG-16 + ASPP backbone produces multi-scale features and regresses FF via weighted L2L_2 loss (Wang et al., 2018).

Multi-scale side-output fusion (FSDS/LMSDS/DeepSkeleton). These models attach side branches at multiple backbone depths (with receptive fields \sim object part thickness), each supervised to localize skeleton pixels only at the scales their receptive fields can reliably see. A fusion layer aggregates scale-specific predictions, allowing simultaneous localization of fine and coarse skeleton structures and direct object thickness estimation (Shen et al., 2016, Shen et al., 2016).

Point-cloud and graph-based skeletons. Recent advances use point-based encoders such as PointNet++ to learn convex combinations of surface sample points yielding skeletal sheets (s-reps), with explicit radius and spoke direction prediction. Geometric losses (Chamfer distance, medial enforcement, spread regularization) push the skeleton points to correctly cover the interior and maintain correct scale (Khargonkar et al., 2023). Extension to graph convolutional networks enables direct output of template-aligned s-rep graphs from volumetric segmentation maps (Gaggion et al., 2024).

Iterative thinning via compact CNNs. Skelite employs a tiny CNN within a differentiable morphological thinning loop, receiving the image, current boundary band, and partial skeleton at each step. The learned deletion mask determines removable boundary pixels, iteratively refining the skeleton. This approach achieves dramatic speedups over Boolean or topology-constrained thinning while retaining connectivity (Vargas et al., 10 Mar 2025).

Differentiable skeletonization modules. Fully differentiable algorithms have been proposed based solely on tensor operations, convolutions, and basic nonlinearities, including subfield-parallel boundary peeling and Gumbel-Sigmoid for Bernoulli discretization. Simple-point detection is done via Euler-characteristic or Boolean rules in 3D neighborhoods, maintaining topological invariance throughout thinning (Menten et al., 2023).

3. Training Protocols, Losses, and Evaluation

Supervised skeletonization requires annotated ground-truth (pixelwise skeletons, medial sheets, or point clouds). Training datasets span natural image benchmarks (SK-LARGE, SK506, WH-SYMMAX, SYMMAX300), binary shape datasets (SkelNetOn), and 3D medical volumes (aorta CT, left atrium MRI, hippocampus MRI) (Wang et al., 2018, Shen et al., 2016, Pepe et al., 2024).

Key loss functions include:

Quantitative metrics universally include maximum F-measure, precision-recall curves, Chamfer/Hausdorff errors, Dice overlap, Betti number errors (topological invariants), and computational efficiency in ms/image or s/volume (Shen et al., 2016, Wang et al., 2018, Menten et al., 2023, Vargas et al., 10 Mar 2025).

4. Postprocessing, Topology, and Scale Consistency

Flux and side-output methods apply postprocessing to convert vector predictions to discrete skeletons. DeepFlux uses compass-direction quantization and morphological closing for 1-pixel smoothing (Wang et al., 2018). Multi-scale fusion models infer per-pixel scale and reconstruct segmentation masks by union of disks anchored at skeleton pixels (Shen et al., 2016).

Topological preservation is a unifying concern. Boolean and Euler-based thinning explicitly prevent deletions that alter foreground and background connectivity, ensuring correct branch and loop retention (Menten et al., 2023). Skelite demonstrates that compact neural thinning can reproduce Boolean skeletons nearly exactly, with 100x speedup (Vargas et al., 10 Mar 2025). Generative models using the MAT, as in GEM3D, preserve holes and high-genus topology in synthesized shapes by modeling skeleton point connectivity directly through denoising diffusion (Petrov et al., 2024).

Scale consistency arises from multi-level feature fusion, skeleton sheets with radius fields, and side-output mediation, directly encoding local thickness and enabling reconstruction of minimal representations (Shen et al., 2016, Khargonkar et al., 2023).

5. Extensions to 3D, Point Clouds, and Generative Models

Geometric deep skeletonization methods generalize beyond 2D images. DeepFlux proposes learning 3D vector fields in volumetric contexts for medical shapes (Wang et al., 2018). PointNet++ encoders and spectral GCN decoders output s-rep skeleton sheets and spoke graphs directly from boundary samples or masks (Khargonkar et al., 2023, Gaggion et al., 2024). Deep Medial Voxels approximate medial axes using learned distance fields and convolution surfaces, supporting mesh generation for anatomical modeling and CFD/FSI pipelines (Pepe et al., 2024).

Generative models such as GEM3D explicitly synthesize skeleton point clouds via diffusion, learning MAT geometry and latent shape features, then fabricate surfaces via skeleton-driven implicit fields for complex topological objects (Petrov et al., 2024). Skeleton-bridged approaches inflate learned skeleton point clouds to occupancy grids and meshes, allowing efficient and faithful 3D reconstruction from single RGB images (Tang et al., 2019).

6. Comparative Results, Limitations, and Applications

Comparisons across methods demonstrate that geometric skeletonization yields higher F-measures on skeleton extraction, lower Chamfer/Hausdorff errors, and better topology preservation than edge/segmentation baselines or classical morphology. DeepFlux, FSDS, LMSDS, and DeepSkeleton consistently outperform pixel classification and traditional extraction in both accuracy and generalization (Wang et al., 2018, Shen et al., 2016, Shen et al., 2016).

Quantitative highlights:

  • DeepFlux: F=0.732 (SK-LARGE), 0.840 (WH-SYMMAX); runtime ≈19 ms/image (Wang et al., 2018).
  • FSDS: F=0.623 (SK506), 0.769 (WH-SYMMAX) (Shen et al., 2016).
  • Skelite: Dice=0.78, β0/β1 errors <3.2, runtime 8 ms (DRIVE dataset) (Vargas et al., 10 Mar 2025).
  • Fully differentiable skeletonization: zero topology error, compatible with segmentation and registration pipelines (Menten et al., 2023).
  • GEM3D: best shape fidelity (MMD-CD 8.64e-2), high precision/recall, state-of-the-art surface fidelity for high-genus shapes (Petrov et al., 2024).

Limitations noted include the need for annotated skeletal ground truth, noise sensitivity near thin/occluded structures, fixed skeleton point counts limiting adaptation to highly variable geometry, and lack of differentiable postprocessing in some pipelines (Wang et al., 2018, Khargonkar et al., 2023, Gaggion et al., 2024). Future directions include adaptive skeleton composition, joint segmentation/skeletonization training, spline-based continuous skeleton representations, incorporation of curvature and normal priors, and adversarial skeleton sharpening.

Applications encompass object detection, segmentation (especially for curvilinear structures), mesh reconstruction, shape synthesis, anatomical modeling, CAD analysis, and topologically robust registration. Skeleton-based priors and losses increasingly inform downstream learning objectives in medical imaging and geometric computer vision.

7. Dataset Availability and Source Code

Key datasets for skeletonization research (SK-LARGE, SK506, WH-SYMMAX, SYMMAX300, ShapeNet-Skeleton, Pixel SkelNetOn) and the corresponding source code for FSDS, LMSDS, and DeepSkeleton are public (Shen et al., 2016, Shen et al., 2016). Other skeleton generators (SlicerSALT for s-reps, AIRLab for registration) are referenced for template and fitting-based pipelines (Khargonkar et al., 2023, Menten et al., 2023). This open data enables reproducible benchmarking and facilitates comparative evaluation across geometric deep skeletonization algorithms.

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Geometric Deep Skeletonization.