UNet++: A Nested U-Net Architecture for Medical Image Segmentation (1807.10165v1)

Published 18 Jul 2018 in cs.CV, cs.LG, eess.IV, and stat.ML

Abstract: In this paper, we present UNet++, a new, more powerful architecture for medical image segmentation. Our architecture is essentially a deeply-supervised encoder-decoder network where the encoder and decoder sub-networks are connected through a series of nested, dense skip pathways. The re-designed skip pathways aim at reducing the semantic gap between the feature maps of the encoder and decoder sub-networks. We argue that the optimizer would deal with an easier learning task when the feature maps from the decoder and encoder networks are semantically similar. We have evaluated UNet++ in comparison with U-Net and wide U-Net architectures across multiple medical image segmentation tasks: nodule segmentation in the low-dose CT scans of chest, nuclei segmentation in the microscopy images, liver segmentation in abdominal CT scans, and polyp segmentation in colonoscopy videos. Our experiments demonstrate that UNet++ with deep supervision achieves an average IoU gain of 3.9 and 3.4 points over U-Net and wide U-Net, respectively.

Citations (5,245)

View on Semantic Scholar

Summary

The paper introduces UNet++, a novel nested U-Net architecture that uses dense skip connections and deep supervision to reduce the semantic gap between encoder and decoder.
It achieves significant performance gains, demonstrating up to a 3.9-point IoU increase over traditional U-Net models while enabling efficient network pruning.
The approach offers enhanced precision for clinical imaging and paves the way for real-time segmentation and transfer learning applications in medical diagnostics.

UNet++: A Nested U-Net Architecture for Medical Image Segmentation

The paper presents UNet++, an advanced neural network architecture developed to enhance medical image segmentation accuracy. Building upon the standard U-Net architecture, UNet++ incorporates nested and dense skip pathways, along with deep supervision mechanisms, to address the semantic disparity between encoder and decoder feature maps, which typically hinders the performance of traditional U-Net models.

Architectural Innovations

UNet++ introduces significant modifications to the original U-Net structure:

Re-designed Skip Pathways: Unlike U-Net, where feature maps from the encoder are directly transferred to the decoder via skip connections, UNet++ uses a series of nested dense convolutional blocks. These pathways progressively adapt the feature maps, thereby minimizing the semantic gap between encoder and decoder:

x^{i,j}=\begin{cases}
\mathcal{H}\left(x^{i-1,j}\right),  & j=0  \
\mathcal{H}\left(\left[\left[x^{i,k}\right]_{k=0}^{j-1}, \mathcal{U}(x^{i+1,j-1}) \right]\right), & j>0  \
\end{cases}

Deep Supervision: Deep supervision is applied at multiple intermediate layers, which not only aids in training by providing gradient signals at various depths but also allows for selective pruning of the network during inference:

1	\mathcal{L}(Y,\hat{Y}) = -\frac{1}{N}\sum_{b=1}^{N}{\left(\frac{1}{2}\cdot Y_b\cdot\log{\hat{Y}_b}+\frac{2\cdot Y_b\cdot \hat{Y}_b}{Y_b+\hat{Y}_b}\right)}

Experimental Validation

The efficacy of UNet++ was rigorously evaluated across multiple datasets, including lung nodule segmentation in CT scans, colon polyp segmentation in videos, liver segmentation in CT, and cell nuclei segmentation in microscopy images. Each dataset was carefully curated to ensure robustness in segmentation tasks.

Comparison with Baseline Models: The experiments demonstrated that UNet++ with deep supervision consistently outperformed the original U-Net and a customized wide U-Net (designed to have a parameter count comparable to that of UNet++). Specifically, UNet++ provided an IoU gain of 3.9 points over U-Net and 3.4 points over the wide U-Net.
Model Pruning: The use of deep supervision allowed for network pruning at different levels, enabling a trade-off between computational efficiency and segmentation accuracy. UNet++ pruned at level $L^{3}$ resulted in a 32.2% reduction in inference time with a minimal decrease in IoU (0.6 points).

Implications and Future Directions

The proposed UNet++ architecture significantly advances the accuracy and reliability of medical image segmentation, a critical requirement in clinical settings where precision is paramount. By narrowing the semantic gap between the encoder and decoder, UNet++ facilitates better feature classification and boundary delineation, crucial for detecting pathological structures.

From a theoretical perspective, the nested dense skip pathways introduce a novel paradigm for mitigating feature disparity in encoder-decoder networks, potentially applicable to other domains beyond medical imaging. The deep supervision mechanism not only accelerates training by offering multiple gradient pathways but also enables dynamic network scaling based on computational constraints.

Future research could explore various avenues:

Integration with Meta Frameworks: Incorporating UNet++ as the backbone architecture in complex frameworks like Mask-RCNN could further enhance segmentation performance for instance segmentation tasks.
Transfer Learning: Leveraging pre-trained models on diverse medical imaging datasets could yield improvements in training convergence and overall segmentation accuracy.
Real-Time Segmentation: Optimizing the network architecture for real-time applications, particularly in intraoperative settings, could revolutionize surgical assistance systems.

UNet++ stands as a landmark contribution to medical image segmentation, offering a blend of improved accuracy, computational efficiency, and flexibility, paving the way for more robust and reliable clinical decision support systems.

PDF Markdown

Related Papers

Tweets

https://twitter.com/hansgurler/status/1857168500353094078

YouTube

Show All Videos