DoubleU-Net: A Deep Convolutional Neural Network for Medical Image Segmentation (2006.04868v2)

Published 8 Jun 2020 in eess.IV and cs.CV

Abstract: Semantic image segmentation is the process of labeling each pixel of an image with its corresponding class. An encoder-decoder based approach, like U-Net and its variants, is a popular strategy for solving medical image segmentation tasks. To improve the performance of U-Net on various segmentation tasks, we propose a novel architecture called DoubleU-Net, which is a combination of two U-Net architectures stacked on top of each other. The first U-Net uses a pre-trained VGG-19 as the encoder, which has already learned features from ImageNet and can be transferred to another task easily. To capture more semantic information efficiently, we added another U-Net at the bottom. We also adopt Atrous Spatial Pyramid Pooling (ASPP) to capture contextual information within the network. We have evaluated DoubleU-Net using four medical segmentation datasets, covering various imaging modalities such as colonoscopy, dermoscopy, and microscopy. Experiments on the MICCAI 2015 segmentation challenge, the CVC-ClinicDB, the 2018 Data Science Bowl challenge, and the Lesion boundary segmentation datasets demonstrate that the DoubleU-Net outperforms U-Net and the baseline models. Moreover, DoubleU-Net produces more accurate segmentation masks, especially in the case of the CVC-ClinicDB and MICCAI 2015 segmentation challenge datasets, which have challenging images such as smaller and flat polyps. These results show the improvement over the existing U-Net model. The encouraging results, produced on various medical image segmentation datasets, show that DoubleU-Net can be used as a strong baseline for both medical image segmentation and cross-dataset evaluation testing to measure the generalizability of Deep Learning (DL) models.

Authors (5)

Debesh Jha (78 papers)
Michael A. Riegler (60 papers)
Dag Johansen (19 papers)
Pål Halvorsen (69 papers)
Håvard D. Johansen (16 papers)

Citations (512)

View on Semantic Scholar

Summary

DoubleU-Net: A New Paradigm for Medical Image Segmentation

The paper "DoubleU-Net: A Deep Convolutional Neural Network for Medical Image Segmentation" introduces an innovative architecture designed to enhance performance in medical image segmentation tasks. DoubleU-Net utilizes a dual U-Net architecture framework, distinguished from conventional models by incorporating a pre-trained VGG-19 encoder and Atrous Spatial Pyramid Pooling (ASPP) to optimize segmentation accuracy across diverse datasets.

Key Contributions and Methodology

DoubleU-Net presents a substantial improvement to the traditional U-Net by integrating two network structures stacked in sequence. The first component utilizes a pre-trained VGG-19 encoder, leveraging knowledge from ImageNet to enhance feature extraction. The cascading of two U-Nets enables deeper semantic context capture, and ASPP is integrated to enrich contextual understanding, leading to improved segmentation precision. Additionally, the architecture utilizes squeeze-and-excite blocks to refine feature maps by emphasizing relevant information.

The proposed model is validated using multiple publicly available medical datasets, including those from colonoscopy, dermoscopy, and microscopy tasks. Noteworthy datasets such as the 2015 MICCAI sub-challenge, CVC-ClinicDB, and the 2018 Data Science Bowl provide a diverse testing ground, demonstrating that DoubleU-Net outperforms existing U-Net models, particularly in scenarios involving challenging image characteristics like small and flat polyps.

Experimental Outcomes

Empirical evaluation reveals DoubleU-Net's superior capability in producing accurate segmentation masks. For instance, notable performance metrics include a Dice Similarity Coefficient (DSC) of 0.7649 on the 2015 MICCAI dataset and 0.9239 on CVC-ClinicDB, marking significant improvements over comparative baselines. Additionally, the architecture's robustness is showcased through cross-dataset testing, reinforcing its utility as a standardized model for medical image segmentation tasks.

Implications and Future Directions

The implementation of DoubleU-Net demonstrates both theoretical and practical advancements in medical image analysis. The evidence suggests that models pre-trained on large datasets such as ImageNet bring valuable enhancements, particularly when encountering the challenge of limited annotated medical data. This has direct implications for improving diagnostics and treatment planning, potentially impacting clinical workflow efficiency and accuracy.

From a theoretical standpoint, this paper propels the ongoing discourse on model generalization and robustness in AI applications. The DoubleU-Net can serve as a benchmark architecture for evaluating cross-dataset generalizability, a critical feature for models intended for clinical use.

Future research may focus on reducing the architectural complexity of DoubleU-Net to decrease computational costs without compromising accuracy. Furthermore, exploring integration with post-processing techniques and other convolutional blocks presents promising avenues for further enhancement of segmentation capabilities.

In summary, this paper offers significant insights into deep learning applications in medical segmentation, providing a robust framework that sets a new standard for accuracy and applicability across varied imaging datasets. The DoubleU-Net architecture is positioned as a formidable tool for advancing the efficacy and reliability of computer-aided medical diagnosis and research.

PDF Markdown

Related Papers

YouTube

Show All Videos