Synapse Multi-Organ CT Segmentation

Updated 21 April 2026

The paper introduces advanced architectures, such as MFTC-Net and SACNet, achieving high segmentation accuracy (mean Dice >89%) via multi-scale and attention-based fusion.
It leverages standardized CT preprocessing and hybrid loss functions (Dice plus cross-entropy) to manage organ size variability and address annotation scarcity.
The incorporation of boundary-aware and shape-constrained learning enhances delineation of irregular organs, promoting efficient clinical deployment.

Synapse multi-organ CT segmentation refers to the automated delineation of multiple anatomical structures from abdominal computed tomography (CT) scans, as benchmarked by the MICCAI Synapse multi-organ challenge. This task is foundational in computer-aided diagnosis, radiotherapy planning, and volumetric analytics. State-of-the-art approaches draw from convolutional, transformer-based, hybrid, and semi-supervised deep learning, addressing challenges in scale variation, organ proximity, and annotation scarcity. Leading methods now achieve high accuracy on the Synapse benchmark, with mean Dice scores exceeding 89%, and enable segmentation of both large and small or irregularly shaped organs under tight computational constraints.

1. Problem Definition and Benchmark

Synapse multi-organ CT segmentation targets the delineation of eight abdominal structures: aorta, gallbladder, spleen, left kidney, right kidney, liver, pancreas, and stomach. The typical dataset consists of 30 contrast-enhanced abdominal CT volumes, with standard training/validation splits of 24/6 or 18/12 patients. Evaluation metrics include Dice similarity coefficient (DSC) for volumetric overlap and the 95th percentile Hausdorff distance (HD95) for boundary accuracy. High performance requires precise, class-balanced segmentation that is robust to both organ size variability and intensity ambiguities.

2. Deep Learning Architectures for Multi-Organ Segmentation

The field has evolved from classic encoder–decoder CNNs to multi-path, attention-augmented, and transformer-hybrid architectures.

Model	Mean DSC (%)	HD95 (mm)	Key Innovations
MFTC-Net (Shabani et al., 2024)	89.73	7.31	Multi-aperture Swin transformer+3D conv fusion
SACNet (Zhang et al., 2024)	84.92	15.13	DCNv3-based adaptive receptive fields, t-vMF loss
MD-RWKV-UNet (Fang, 28 Mar 2026)	85.07	14.67	DSS + RWKV + SKA + cross-stage attention fusion
EDLDNet (Hassan et al., 23 Aug 2025)	84.00	—	PVTv2 encoder, dual-line decoder, mutation loss
OARFocalFuseNet (Srivastava et al., 2022)	81.37	—	Multi-scale/focal modulation, 3D dense fusion
FMD-TransUNet (Lu et al., 19 Sep 2025)	81.32	16.35	Frequency-domain MEWB, dual-attention DA+ block
DLUNet (Lai et al., 2022)	87.18	—	Light UNet, dual-branch semi-supervised learning
TotalSegmentator (Wasserthal et al., 2022)	94.3 (global, 104 labels)	—	3D nnU-Net, large-scale generalization

Recent SOTA models introduce multi-scale and attention-based fusion mechanisms. MFTC-Net integrates four nested Swin Transformer streams at different apertures (field of view crops), each combined with a 3D U-Net-style conv backbone, and merges them using Squeeze-and-Excitation and CBAM-aware fusion blocks (Shabani et al., 2024). SACNet employs grouped deformable convolutions (DCNv3) and dynamic loss rebalancing for improved coverage of both large and challenging small organs (Zhang et al., 2024). MD-RWKV-UNet dynamically adapts receptive fields by blending deformable shifts and Receptance Weighted Key Value units, with cross-stage attention to preserve boundary detail across resolution hierarchies (Fang, 28 Mar 2026).

Further, EDLDNet demonstrates that a dual-line decoder with structured noise injection and attention-based multi-scale blocks achieves high accuracy at lower computational cost (5.6 G MACs) by only executing the noise-free stream at inference (Hassan et al., 23 Aug 2025). FMD-TransUNet advances frequency-domain feature extraction using axis-wise Fourier transforms and complements this with channel/spatial dual attention mechanisms, emphasizing improved boundary localization (Lu et al., 19 Sep 2025).

3. Training Protocols, Data Preprocessing, and Losses

Canonical pipelines standardize CT intensity (HU) ranges, resample to isotropic voxels (1 mm³), and apply on-the-fly spatial/data augmentations (random rotations, scaling, elastic deformation). Input patches for 3D networks range from 128³ to 224², depending on GPU constraints and network design.

Loss functions are almost universally hybrids, combining soft Dice and cross-entropy (often equally weighted or with data-driven weights):

$L_{total} = L_{Dice} + L_{CE}$

$L_{Dice} = 1 - \frac{2 \sum_i p_i g_i}{\sum_i p_i + \sum_i g_i}$

Advanced frameworks introduce shape- or boundary-aware penalties. MFTC-Net augments the basic composite with a surface-aware distance transform penalty, which sharpens predicted organ boundaries by penalizing misalignment at the organ–background interface (Shabani et al., 2024). SACNet integrates a t-vMF Dice term, dynamically adjusting angular (cosine) compactness per class based on per-epoch IoU, compensating for the natural Dice imbalance between large and small organs (Zhang et al., 2024). EDLDNet's “mutation” loss fuses predictions from both clean and noisy decoders across multiple scales using the powerset of output maps, regularizing the network to be robust under input or feature perturbations (Hassan et al., 23 Aug 2025).

Semi-supervised methods (DDLU/DMPCT) exploit large pools of unlabeled CTs by training dual/lightweight networks with cross-pseudo supervision or multi-planar co-training and pseudo-label generation, boosting DSC by 2–4% in annotation-scarce regimes (Zhou et al., 2018, Lai et al., 2022).

4. Multi-Scale, Attention, and Frequency-Domain Fusion Strategies

Multi-scale fusion is essential to capture both the global context for large organs (e.g., liver, spleen) and fine, boundary-level detail for small or irregularly shaped targets (e.g., pancreas, gallbladder). Key strategies include:

Multi-aperture transformers: MFTC-Net processes input CT crops at nested spatial scales, merging global and local context via parallel Swin Transformer streams and convs before upsampling into a shared decoder (Shabani et al., 2024).
Dense scale fusion: OARFocalFuseNet and 3D-MSF aggregate feature maps from multiple encoder stages, promoting rich inter-scale context exchange. OARFocalFuseNet’s “focal modulation” blocks enable the network to weigh global versus local features at each voxel, boosting resilience to organ size and morphology variability (Srivastava et al., 2022).
Axis-wise frequency-domain fusion: FMD-TransUNet extracts low-frequency (global) and high-frequency (boundary/edge) cues along multiple axes via DFTs and projects them jointly for complementary spatial-domain fusion, yielding finer contour delineation (Lu et al., 19 Sep 2025).
Dynamic adaptive receptive fields: SACNet's groupwise DCNv3 blocks and MD-RWKV-UNet's deformable spatial shifts enable per-organ adaptation not just by scale but also by learned local structural regularity, further refined by attention fusion modules (Zhang et al., 2024, Fang, 28 Mar 2026).

5. Boundary-Aware, Shape-Constrained, and Complementary-Task Learning

Boundary consistency is a principal failure mode in multi-organ CT segmentation, particularly where organ boundaries exhibit weak contrast or complex geometry.

Auxiliary boundary prediction: Methods such as boundary-constrained 3D UNet optimize for both the main segmentation mask and organ boundary maps; morphological erosion produces pseudo-GT for training. Multi-task learning improves mean DSC by up to 3.6%, primarily via reductions in boundary errors (Irshad et al., 2022).
Shape priors via task heads: Complementary-task learning trains networks to regress signed Euclidean distances to organ boundaries and predict explicit contour maps in parallel with the segmentation mask, raising plausibility of segmented shapes—a significant advantage for small or high-surface/volume-ratio organs (Navarro et al., 2019).
Surface-aware losses: Penalties such as DistLoss (MFTC-Net) and DA+ (FMD-TransUNet) focus optimization on organ interfaces, shown to reduce HD95 by 15+ mm versus baseline architecture loss (Shabani et al., 2024, Lu et al., 19 Sep 2025).

6. Computational Efficiency, Scaling, and Clinical Deployment

Efficiency is achieved through network depth/width balancing (SACNet’s “widenet” approach), decoder weight sharing, use of depthwise separable convolutions, and selective execution of dual/ensemble branches only during training (e.g., EDLDNet). SACNet achieves best-case trade-offs: 49M parameters, real-time inference, and abundant resource savings over heavier transformer hybrids (Zhang et al., 2024, Hassan et al., 23 Aug 2025).

Generalization and deployment are further demonstrated by TotalSegmentator, which segments 104 structures in CT (including all Synapse organs) by scaling nnU-Net pipelines to very large, multi-site, multi-protocol datasets (n > 1,200, Dice = 0.943) (Wasserthal et al., 2022). Adaptation to local deployment scenarios is facilitated by simple fine-tuning, choice of resolution/model, and easy GPU/CPU execution.

7. Limitations, Open Problems, and Future Directions

Despite high mean DSC scores, current SOTA methods face key obstacles:

Small and irregular organs (e.g., pancreas, gallbladder) remain a challenge; the highest per-organ Mean DSCs lag the large organ performance by >20% (Zhang et al., 2024, Lu et al., 19 Sep 2025).
Real-time 3D inference is hampered by patch-wise architectures and sliding window policies; further optimization or streaming architectures may be required (Shabani et al., 2024).
Robustness to domain shift (contrast phase, scanner, population) is non-trivial. While cross-dataset transfer is feasible, modest accuracy deterioration is observed, motivating more systematic domain adaptation and pretraining strategies (Crespi et al., 2023, Wasserthal et al., 2022).
Combining explicit shape constraints, self-supervised representation learning, and boundary-aware auxiliary heads is a promising but incompletely explored direction.

Proposed avenues include: dynamic transformer design for reduced flops (MFTC-Net), self-supervised pretraining (to mitigate scarce annotation), 3D/volumetric extension of successful 2D networks (SACNet), and uncertainty quantification for clinical trust (Shabani et al., 2024, Zhang et al., 2024).

References:

"Multi-Aperture Fusion of Transformer-Convolutional Network (MFTC-Net) for 3D Medical Image Segmentation and Visualization" (Shabani et al., 2024)
"SACNet: A Spatially Adaptive Convolution Network for 2D Multi-organ Medical Segmentation" (Zhang et al., 2024)
"MD-RWKV-UNet: Scale-Aware Anatomical Encoding with Cross-Stage Fusion for Multi-Organ Segmentation" (Fang, 28 Mar 2026)
"An Efficient Dual-Line Decoder Network with Multi-Scale Convolutional Attention for Multi-organ Segmentation" (Hassan et al., 23 Aug 2025)
"An Efficient Multi-Scale Fusion Network for 3D Organ at Risk (OAR) Segmentation" (Srivastava et al., 2022)
"FMD-TransUNet: Abdominal Multi-Organ Segmentation Based on Frequency Domain Multi-Axis Representation Learning and Dual Attention Mechanisms" (Lu et al., 19 Sep 2025)
"DLUNet: Semi-supervised Learning based Dual-Light UNet for Multi-organ Segmentation" (Lai et al., 2022)
"TotalSegmentator: robust segmentation of 104 anatomical structures in CT images" (Wasserthal et al., 2022)
"Ensemble Methods for Multi-Organ Segmentation in CT Series" (Crespi et al., 2023)
"Shape-Aware Complementary-Task Learning for Multi-Organ Segmentation" (Navarro et al., 2019)
"Improved Abdominal Multi-Organ Segmentation via 3D Boundary-Constrained Deep Neural Networks" (Irshad et al., 2022)
"Semi-Supervised Multi-Organ Segmentation via Deep Multi-Planar Co-Training" (Zhou et al., 2018)