UNet 3+: A Full-Scale Connected UNet for Medical Image Segmentation (2004.08790v1)

Published 19 Apr 2020 in eess.IV, cs.CV, and cs.LG

Abstract: Recently, a growing interest has been seen in deep learning-based semantic segmentation. UNet, which is one of deep learning networks with an encoder-decoder architecture, is widely used in medical image segmentation. Combining multi-scale features is one of important factors for accurate segmentation. UNet++ was developed as a modified Unet by designing an architecture with nested and dense skip connections. However, it does not explore sufficient information from full scales and there is still a large room for improvement. In this paper, we propose a novel UNet 3+, which takes advantage of full-scale skip connections and deep supervisions. The full-scale skip connections incorporate low-level details with high-level semantics from feature maps in different scales; while the deep supervision learns hierarchical representations from the full-scale aggregated feature maps. The proposed method is especially benefiting for organs that appear at varying scales. In addition to accuracy improvements, the proposed UNet 3+ can reduce the network parameters to improve the computation efficiency. We further propose a hybrid loss function and devise a classification-guided module to enhance the organ boundary and reduce the over-segmentation in a non-organ image, yielding more accurate segmentation results. The effectiveness of the proposed method is demonstrated on two datasets. The code is available at: github.com/ZJUGiveLab/UNet-Version

Citations (1,495)

View on Semantic Scholar

Summary

The paper introduces full-scale skip connections and deep supervision to capture multi-scale features, resulting in improved segmentation accuracy.
Experimental results demonstrate that UNet 3+ outperforms UNet and UNet++ with higher Dice coefficients and fewer model parameters on liver and spleen segmentation tasks.
The study validates a novel hybrid loss function and classification-guided module that effectively reduce false positives and enhance boundary detection.

UNET 3+: A Full-Scale Connected UNET for Medical Image Segmentation

The paper presents UNET 3+, a novel architecture designed for medical image segmentation, which seeks to enhance the performance of existing UNet structures. UNet 3+ introduces several key improvements, including full-scale skip connections and deep supervision, to provide a more robust segmentation solution across varying organ scales.

Core Contributions

UNET 3+ builds upon the foundational UNet, which is known for its encoder-decoder architecture, and its subsequent iteration UNet++, by addressing their limitations in exploring full-scale features. The core contributions of the UNet 3+ are as follows:

Full-Scale Skip Connections: Unlike conventional skip connections which are linear or nested as in UNet++, UNet 3+ uses full-scale skip connections that aggregate multi-scale features. These connections effectively merge low-level spatial details with high-level semantic information across different scales, enhancing the contextual understanding essential for precise segmentation.
Deep Supervision: The architecture employs a full-scale deep supervision mechanism, enabling hierarchical learning from feature maps at multiple scales. This approach ensures that every level of the decoder is precisely guided by ground truth through a combination of convolution, batch normalization, and ReLU activation functions.
Hybrid Loss Function: A novel hybrid loss function combining focal loss, MS-SSIM, and IoU is proposed to optimize the network at pixel, patch, and map levels. This multi-faceted loss function emphasizes boundary accuracy and overall segmentation quality.
Classification-Guided Module (CGM): This module predicts whether an input image contains the target organ, thereby reducing false positives in non-organ images. By integrating classification outputs into the segmentation process, UNet 3+ reduces over-segmentation, enhancing the final segmentation accuracy.

Experimental Validation

The authors validated the effectiveness of the proposed UNet 3+ on liver and spleen segmentation tasks using datasets from ISBI LiTS 2017 Challenge and a hospital-collected spleen dataset. They performed comprehensive experiments comparing UNet 3+ with UNet and UNet++, leveraging two backbones: Vgg-16 and ResNet-101.

Table 1: Dice Coefficient Comparison

| Architecture        | Params  | Vgg-16 Dice | ResNet-101 Dice |
|||-||
| UNet                | 39.39M  | 0.9114      | 0.9360           |
| UNet++              | 47.18M  | 0.9254      | 0.9449           |
| UNet 3+ w/o DS      | 26.97M  | 0.9460      | 0.9559           |
| UNet 3+             | 26.97M  | 0.9523      | 0.9580           |

The experiment results demonstrated that UNet 3+ significantly outperforms both UNet and UNet++ architectures. Notably, UNet 3+ achieves higher Dice coefficients while maintaining computational efficiency with fewer parameters.

State-of-the-Art Comparisons

UNet 3+ was also compared with other state-of-the-art approaches such as PSPNet, different versions of DeepLab, and Attention UNet. The following results summarize the performance:

Table 2: Quantitative Comparison Results (Dice Coefficient)

| Method                   | Liver Dice | Spleen Dice |
|--||-|
| PSPNet                   | 0.9217     | 0.9312      |
| DeepLabV3+               | 0.9290     | 0.9367      |
| Attention UNet           | 0.9341     | 0.9458      |
| UNet 3+ (Hybrid loss)    | 0.9588     | 0.9620      |
| UNet 3+ (Hybrid loss+CGM)| 0.9675     | 0.9675      |

UNet 3+, particularly when augmented with the hybrid loss function and CGM, exhibits superior performance compared to the existing methodologies. The enhancements in boundary detection and overall segmentation accuracy position UNet 3+ as a highly effective architecture for medical image segmentation tasks.

Implications and Future Directions

The paper confirms that effectively incorporating multi-scale features and hierarchical supervision significantly improves segmentation accuracy in medical imaging. The architectural strategies introduced by UNet 3+ have potential applications beyond medical image segmentation, potentially benefiting other domains requiring precise boundary detection and segmentation consistency.

Future developments may focus on further reducing computational load, integrating advanced attention mechanisms, and exploring the adaptation of UNet 3+ to other medical imaging modalities such as MRI or ultrasound. Additionally, leveraging transfer learning techniques might expedite training processes and adapt the architecture to a broader array of medical conditions.

In conclusion, UNet 3+ represents a substantive step forward in medical image segmentation, offering enhanced accuracy and efficiency through its comprehensive and innovative architectural design.

PDF Markdown

Related Papers

YouTube

Show All Videos