Deep Learning Nerve Segmentation

Updated 7 February 2026

Deep learning nerve segmentation is a method that uses CNNs and transformers to extract nerve structures from various biomedical images.
It leverages encoder-decoder architectures and topology-aware losses to overcome challenges like small object detection and class imbalance.
The approach enhances diagnostic accuracy and surgical planning across modalities such as microscopy, ultrasound, MRI, CT, and OCT.

Deep learning-based nerve segmentation refers to the use of modern neural network architectures, primarily convolutional neural networks (CNNs) and transformers, to delineate neural structures from various biomedical imaging modalities. Precise segmentation of nerves—including axons, myelin, nerve fibers, nerve trunks, and roots—is crucial for quantitative morphometry, surgical planning, disease monitoring, and neurophysiological studies. This article synthesizes the core methodologies, architectural advances, validation strategies, and critical challenges in deep learning-based nerve segmentation across imaging domains such as microscopy, ultrasound, CT, and MRI.

1. Imaging Modalities and Application Scope

Deep learning-based segmentation systems target a range of nerve structures across multiple modalities:

Microscopy (SEM/TEM/CCM): Axon and myelin segmentation for morphometry and neuropathology, e.g., AxonDeepSeg for electron microscopy data (Zaimi et al., 2017), spatially constrained CNNs for corneal nerve fibers (Zhang et al., 2020), hierarchical self-supervised transformers for diabetic neuropathy diagnosis in corneal confocal microscopy (Zhang et al., 24 Jun 2025).
Ultrasound (US): Peripheral nerve identification for ultrasound-guided regional anesthesia (UGRA), including brachial plexus, supraclavicular nerves, and vagus nerve (Wang et al., 2022, Miyatake et al., 2022, Al-Battal et al., 2021), with domain adaptation, device-mixing, and hybrid models (Yves et al., 31 Jan 2026, Boxtel et al., 2021).
Magnetic Resonance Imaging (MRI): Segmentation of spinal cord nerve rootlets and functional-level analysis using 3D multi-class CNNs with active learning (Valosek et al., 2024).
Computed Tomography (CT): Lumbosacral nerves, optic nerve, and facial nerve segmentation using 3D U-Nets or uncertainty-aware dual-stream models (Fan et al., 2018, Zhu et al., 2020, Zhu et al., 2024).
Optical Coherence Tomography (OCT): Multi-layer segmentation of the optic nerve head (ONH) for glaucoma and neurodegeneration biomarker extraction (Devalla et al., 2018, Devalla et al., 2020), with device-independent harmonization (Marques et al., 2021).

These systems enable both volumetric and thin-structure segmentations, addressing varying signal, noise, and class-imbalance regimes.

2. Network Architectures and Methodological Advances

The dominant architectural patterns are encoder–decoder networks based on U-Net variants, often adapted to the dimensionality or topology of the imaging context:

Key Architecture	Modality/Task	Notable Features	Reported Metric (Nerve)
Standard/U-shaped U-Net	US, MRI, microscopy, neuron cubes	Encoder–decoder, skip connections, batch-norm, dropout	Dice: up to 0.905 (Fan et al., 2018)
Attention U-Net	US (brachial plexus)	Channel/spatial gating, best accuracy in comparison (Wang et al., 2022)	IoU: 0.5238
Dilated U-Net/DeepLab	US (supraclavicular)	Expanded bottleneck, atrous convolutions, multi-scale context (Thomas et al., 16 Jul 2025, Miyatake et al., 2022)	Dice: 0.56–0.78
Hierarchical Vision Transformers (HMSViT)	CCM, DPN diagnosis	Multi-scale pooling, dual-attention, block-masked SSL (Zhang et al., 24 Jun 2025)	mIoU: 0.6134
Wavelet-Integrated 3D U-Net	Neuronal microstructure	3D DWT/IDWT for noise/topology, hard-shrink denoising (Li et al., 2021)	mIoU: 0.7706
Uncertainty-Aware Dual Stream (UADSN)	CT (facial nerve)	Synchronized 2D+3D deep streams, uncertainty masking, clDice topology loss (Zhu et al., 2024)	Dice: 0.7979

Distinct architectures are selected and tailored for the unique challenges of each environment:

Small-object detection (nerve bundles, corneal fibers) benefits from attention modules, CNN–CRF hybrids or special topological losses (clDice).
Device/domain adaptation is approached with enhancer (harmonization) networks, domain-mixing during training, or block-masked self-supervised learning.
Three-dimensional context is handled with 3D U-Nets, SV-net, or wavelet-augmented architectures, especially in neuron tracing, rootlet, or lumbosacral nerve segmentation.

3. Loss Functions, Training Protocols, and Augmentation

Losses are typically composed to optimize both pixel-wise and structural concordance:

Binary/multi-class cross-entropy for pixel-wise assignment with class weighting for imbalance (Fan et al., 2018, Valosek et al., 2024).
Dice loss for small structures to maximize region overlap.
Lovász hinge/Jaccard loss to directly optimize intersection-over-union (IoU) for foreground predictions (Wang et al., 2022).
Topology-preserving clDice loss, especially for elongated or tubular nerves (Zhu et al., 2024).
Consistency/adversarial loss in hybrid, uncertainty-aware, or semi-supervised configurations (Zhu et al., 2024, Marques et al., 2021).
Self-supervised loss employing masked-reconstruction on unlabelled data (SSL) (Zhang et al., 24 Jun 2025).

Common augmentation and preprocessing steps include geometric transforms, intensity normalization or histogram equalization, patch/cube cropping (especially in 3D), and augmentation mimicking anatomical variability (random scaling, elastic deformations, contrast jittering).

Threshold selection for binarization may be grid-searched and optimized directly on validation metrics (e.g., T = 0.14 for DeepLabV3-based US segmentation) (Thomas et al., 16 Jul 2025).

4. Dataset Curation, Annotations, and Validation Strategies

Robust annotation and validation protocols are fundamental:

Dataset sizes span from compact (28 annotated volumes for facial nerve CT (Zhu et al., 2024)) to large public datasets (7,879 orbital CT slices (Zhu et al., 2020), >6,000 US images (Al-Battal et al., 2021)).
Annotation types vary from full-pixel (microscopy, CT, MRI, some US) to weak (bounding box masks in US tracking (Al-Battal et al., 2021)), to skeletonized traces (corneal/confocal microscopy (Zhang et al., 2020)).
Cross-device or cross-site validation is essential for generalization, e.g., training on multiple US or OCT machines and explicitly reporting inter-vendor and inter-site metric variance (Valosek et al., 2024, Devalla et al., 2020, Yves et al., 31 Jan 2026).
Active learning is increasingly used to minimize expert annotation burden by iterative model-in-the-loop corrections (Valosek et al., 2024).
Metric selection: Dice coefficient, IoU, accuracy, sensitivity/specificity, and volumetric agreement are standard; some works also use boundary-based metrics (ASSD, Hausdorff) or topological scores (clDice) for thin structures.

Validation is typically performed via k-fold cross-validation, leave-one-subject-out split, and careful patient-level separation to avoid data leakage, with ablation studies quantifying the contribution of individual architectural modules and loss terms.

5. Quantitative Performance and Comparative Analysis

Performance varies by modality and task, reflecting differences in nerve size, imaging artifacts, annotation scope, and data quality. Representative results include:

Task/Modality	Architecture	Reported Metric (Nerve)	Reference
Brachial plexus US (binary)	U-Net/Att U-Net	IoU: 0.5238 (Att U-Net, comparable or superior to best doctor)	(Wang et al., 2022)
Brachial plexus US (multi-class)	U-Net	Dice drop: up to –61% for small nerves (class imbalance)	(Yves et al., 31 Jan 2026)
Facial nerve CT	UADSN	Dice: 0.7979, ASSD: 0.0952 mm	(Zhu et al., 2024)
Lumbosacral nerve CT	3D U-Net	Dice: 0.905, IoU: 0.827	(Fan et al., 2018)
Optic nerve/orbit CT	SV-net (3D V-Net)	IoU: 0.8337 (nerve), mIoU: 0.8207	(Zhu et al., 2020)
Corneal nerve fiber segmentation	CRF-constrained U-Net	Dice: 0.80 (synthetic), qualitative recovery of thick/fine fibers	(Zhang et al., 2020)
Corneal CCM (ViT/SSL)	HMSViT	mIoU: 0.6134 (outperforms hierarchical Swin/HiViT by ~6%)	(Zhang et al., 24 Jun 2025)
ONH, multi-layer OCT	DRUNET	Dice: mean 0.91 (all tissues)	(Devalla et al., 2018)
Spinal rootlets MRI	3D U-Net+AL	Dice: 0.67 ± 0.16 (C2–C8)	(Valosek et al., 2024)
Vagus nerve US (tracking)	Weakly supervised U-Net	Precision: >94%, Recall: >97%	(Al-Battal et al., 2021)
Supraclavicular nerve US	Dilated U-Net	Dice: 0.56 (dilated) vs. 0.52 (standard)	(Miyatake et al., 2022)

A recurring observation is the degradation in small-structure (e.g., nerve fiber) Dice under class imbalance without loss reweighting or topology constraints (Yves et al., 31 Jan 2026). Attention gates, SSL or harmonization pipelines, and topology-aware losses improve robustness and boundary/circuit continuity.

6. Critical Challenges and Methodological Considerations

Several methodological and domain-specific challenges pervade nerve segmentation:

Small-target and class imbalance: Nerves often occupy a small fraction of the image, resulting in class imbalance and boundary ambiguity. Customized loss weighting, focal loss, and targeted augmentations are necessary (Yves et al., 31 Jan 2026, Zhang et al., 2020).
Device and domain variability: Cross-device generalization benefits from harmonization networks (e.g., U-Net-based enhancers in OCT), block-masked SSL, or domain mixing, but pure domain pooling can degrade performance on high-quality sources (Devalla et al., 2020, Zhang et al., 24 Jun 2025).
Annotation ambiguity and weak supervision: For small or poorly-contrasted nerves, manual labels are inconsistent or skeletonized. Models that regularize to local image structure (CRF terms), actively learn from in-the-loop corrections, or exploit weak annotation (bounding box masks) mitigate annotation limitations (Zhang et al., 2020, Al-Battal et al., 2021, Valosek et al., 2024).
Topology preservation: Ensuring tubular or tree-like structures are not fragmented requires explicit topology losses (clDice), wavelet-based upsampling, or skeleton supervision (Zhu et al., 2024, Li et al., 2021).
Scalability and efficiency: High-dimensional data are partitioned into cubes/patches for training (e.g., 3D neuron reconstructions (Li et al., 2021)) or benefit from lightweight and self-supervised backbones (Zhang et al., 24 Jun 2025, Zhu et al., 2024).
Standardization: Heterogeneity in ground truth definitions, region nomenclature, and validation metrics impedes cross-study comparability. Best practices include consensus anatomical definitions, common benchmark datasets, and standard reporting on Dice, IoU, boundary error, and specificity (Marques et al., 2021).

7. Limitations, Open Problems, and Future Directions

Despite significant progress, several open problems remain:

Generalization to rare pathologies, pediatric or out-of-distribution cohorts requires domain-adaptive methods, semi-supervised learning, and routine cross-site benchmarking (Devalla et al., 2020, Marques et al., 2021).
Automated uncertainty estimation and sample selection could further improve annotation efficiency, especially in active learning contexts (Valosek et al., 2024, Zhu et al., 2024).
Integration of temporal and volumetric context, especially in ultrasound and MRI, may benefit from 3D/4D architectures, recurrent modules, or ensemble fusion (Hafiane et al., 2017, Wang et al., 2022).
Explainability: Visualizing learned attention, topology compliance, or uncertainty heatmaps remains an open priority for clinical deployment (Zhang et al., 24 Jun 2025, Zhu et al., 2024).
Topological priors and connectivity: Continued development of explicit clDice, tree structure-aware losses, and topology-preserving upsampling will be crucial for ensuring anatomical correctness (Zhu et al., 2024, Li et al., 2021).
Integration with surgical navigation and real-time pipelines: Frame-rate constraints, reliability under motion, and device-agnostic deployment remain active areas of research (Al-Battal et al., 2021).

The field is trending towards multi-stream, self-supervised, and topology-aware architectures, guided by intensive benchmarking and close clinical collaborations. Standardized datasets, generalizable backbones, and interpretable outputs are critical for maturity and widespread adoption.