SA-UNet: Spatial Attention U-Net for Retinal Vessel Segmentation (2004.03696v3)

Published 7 Apr 2020 in eess.IV and cs.CV

Abstract: The precise segmentation of retinal blood vessels is of great significance for early diagnosis of eye-related diseases such as diabetes and hypertension. In this work, we propose a lightweight network named Spatial Attention U-Net (SA-UNet) that does not require thousands of annotated training samples and can be utilized in a data augmentation manner to use the available annotated samples more efficiently. SA-UNet introduces a spatial attention module which infers the attention map along the spatial dimension, and multiplies the attention map by the input feature map for adaptive feature refinement. In addition, the proposed network employs structured dropout convolutional blocks instead of the original convolutional blocks of U-Net to prevent the network from overfitting. We evaluate SA-UNet based on two benchmark retinal datasets: the Vascular Extraction (DRIVE) dataset and the Child Heart and Health Study (CHASE_DB1) dataset. The results show that the proposed SA-UNet achieves state-of-the-art performance on both datasets.The implementation and the trained networks are available on Github1.

Citations (279)

View on Semantic Scholar

Summary

The paper introduces SA-UNet, which integrates structured dropout and a spatial attention module to improve retinal vessel segmentation.
It achieves state-of-the-art performance on DRIVE and CHASE_DB1, recording high accuracy, sensitivity, and AUC values.
The results suggest significant potential for clinical diagnostic applications, notably in resource-constrained settings using few annotated samples.

Overview of "SA-UNet: Spatial Attention U-Net for Retinal Vessel Segmentation"

The paper introduces the Spatial Attention U-Net (SA-UNet), an innovative approach to retinal vessel segmentation, tackling challenges posed by the complexity of retinal blood vessels. The structure and innovation offered by SA-UNet lie in its ability to perform well with limited annotated training samples and its lightweight architecture.

SA-UNet distinguishes itself from traditional models by integrating a Spatial Attention Module (SAM) and employing structured dropout convolutional blocks, which seek to prevent network overfitting. This model does not require thousands of annotated samples, making it highly efficient for data augmentation. Importantly, this research indicates that SA-UNet achieves state-of-the-art performance on the DRIVE and CHASE_DB1 retinal datasets, highlighting its potential in diagnostic and prognostic eye disease applications.

Methodology

The architecture of SA-UNet maintains a typical U-shaped encoder-decoder structure, modified by replacing standard convolutional blocks with structured dropout convolutional blocks and incorporating spatial attention. This strategic alteration is intended to emphasize features pertinent to vessel structures, suppressing irrelevant background features, thereby enhancing the representation capability of the network.

Network Architecture:
- The U-Net framework serves as the backbone of SA-UNet, known for its capability in medical image segmentation.
- The paper employs a structured dropout convolutional block instead of traditional convolutional blocks, integrating DropBlock and Batch Normalization (BN), addressing issues of network overfitting and convergence speed.
Spatial Attention Module (SAM):
- Introduced between the encoder and decoder, SAM utilizes spatial relationships within features to produce an attention map, intended to improve the network's focus on vessel structures.
Implementation:
- The network utilizes the Adam optimizer with binary cross-entropy loss function, trained over 150 epochs, varying learning rates to stabilize training.
Evaluation Metrics:
- Performance is evaluated using sensitivity (SE), specificity (SP), F1-score (F1), accuracy (ACC), Matthews Correlation Coefficient (MCC), and Area Under the Curve (AUC) of receiver operating characteristics.

Results and Discussion

The experimentation conducted with SA-UNet demonstrates its superiority over existing models. Ablation studies reveal that each component, namely structured dropout and spatial attention, contributes to the improved segmentation performance. Notably, SA-UNet outperforms past models such as AG-Net across various metrics including SE (0.8212/0.8573) and ACC (0.9698/0.9755) on the DRIVE and CHASE_DB1 datasets.

The results underscore the impact of integrating spatial attention mechanisms, evidenced by competitive AUC values (0.9864/0.9905) and high sensitivity. Furthermore, SA-UNet achieves a reduction in parameters compared to AG-Net, underscoring its efficiency and effectiveness on small sample datasets typical in retinal imaging.

Implications and Future Direction

The advancements posed by SA-UNet facilitate more accurate retinal vessel segmentation, critical for diagnosing diseases such as diabetic retinopathy and hypertension. The method’s efficiency in handling limited data reveals broader implications in medical informatics and resource-constrained environments, advocating for lightweight, robust models.

Future research could delve into enhancing interpretability, exploring domain adaptation for diverse fundus imaging modalities. Additionally, the integration of SA-UNet into clinical diagnostic settings warrants exploration, considering regulatory and real-world data challenges.

PDF Markdown