Enhancing Retinal Vascular Structure Segmentation in Images With a Novel Design Two-Path Interactive Fusion Module Model (2403.01362v1)

Published 3 Mar 2024 in eess.IV and cs.CV

Abstract: Precision in identifying and differentiating micro and macro blood vessels in the retina is crucial for the diagnosis of retinal diseases, although it poses a significant challenge. Current autoencoding-based segmentation approaches encounter limitations as they are constrained by the encoder and undergo a reduction in resolution during the encoding stage. The inability to recover lost information in the decoding phase further impedes these approaches. Consequently, their capacity to extract the retinal microvascular structure is restricted. To address this issue, we introduce Swin-Res-Net, a specialized module designed to enhance the precision of retinal vessel segmentation. Swin-Res-Net utilizes the Swin transformer which uses shifted windows with displacement for partitioning, to reduce network complexity and accelerate model convergence. Additionally, the model incorporates interactive fusion with a functional module in the Res2Net architecture. The Res2Net leverages multi-scale techniques to enlarge the receptive field of the convolutional kernel, enabling the extraction of additional semantic information from the image. This combination creates a new module that enhances the localization and separation of micro vessels in the retina. To improve the efficiency of processing vascular information, we've added a module to eliminate redundant information between the encoding and decoding steps. Our proposed architecture produces outstanding results, either meeting or surpassing those of other published models. The AUC reflects significant enhancements, achieving values of 0.9956, 0.9931, and 0.9946 in pixel-wise segmentation of retinal vessels across three widely utilized datasets: CHASE-DB1, DRIVE, and STARE, respectively. Moreover, Swin-Res-Net outperforms alternative architectures, demonstrating superior performance in both IOU and F1 measure metrics.

Summary

The paper introduces a two-path interactive fusion model that integrates Swin Transformer and Res2Net within a U-Net to improve retinal vessel segmentation.
It employs a dual encoder strategy to efficiently fuse multi-scale features, enhancing the detection of small vascular structures and reducing redundant information.
Experimental results on CHASE-DB1, DRIVE, and STARE datasets demonstrate superior performance with AUC values exceeding 0.993, outperforming existing segmentation models.

Enhancing Retinal Vascular Structure Segmentation Through Swin-Res-Net Architecture

Introduction

In the field of medical imaging, specifically within the domain of ophthalmology, there's a critical need for accurate segmentation of retinal vessels to diagnose various diseases effectively. Traditional methods employing autoencoding approaches face challenges, particularly due to resolution reduction during the encoding phase, which limits their efficacy in microvascular structure extraction. Enter Swin-Res-Net, a novel architecture devised by Rui Yang and Shunpu Zhang, aimed at addressing these limitations by leveraging the strengths of Swin Transformer within a U-Net framework, integrated with a two-path interactive fusion module.

Methodology

Swin-Res-Net introduces a two-pronged approach, integrating the Swin Transformer and Res2Net within a U-Net architecture's encoder, employing a novel interactive fusion strategy. This methodology comprises several key components:

Feature Extraction Encoder: Utilizes both Swin Transformer and Res2Net architectures to capture a comprehensive range of information from retinal images. The encoder benefits from the Swin Transformer's ability to model long-range dependencies and the Res2Net's capacity to enhance the receptive field via multi-scale techniques.
Interactive Fusion Module: A novel aspect of this architecture, this module efficiently fuses outputs from the two encoder paths, enhancing the model's ability to identify and segment retinal vessels, especially smaller microvascular structures.
Redundant Information Reduction: This component aims to streamline the transition between encoding and decoding phases, preserving essential information while eliminating redundancy, thereby boosting processing efficiency and model performance.

Experimental Results

The Swin-Res-Net architecture was subjected to rigorous testing across three benchmark datasets: CHASE-DB1, DRIVE, and STARE, where it demonstrated superior performance over existing models. The architecture achieved AUC values of 0.9956, 0.9931, and 0.9946, respectively, outperforming other architectures in metrics such as IOU and F1 measure, indicating its effectiveness in retinal vessel segmentation.

Conclusion and Future Directions

Swin-Res-Net exemplifies a significant step forward in retinal vessel segmentation, combining the strengths of Swin Transformer and Res2Net within a U-Net architecture to enhance the precision of vascular structure identification. This architecture not only addresses the nuanced challenges of retinal imaging but also paves the way for future research into applying similar methodologies across different data modalities and medical imaging tasks. Considering the architecture's potential, further exploration into optimizing these models and extending their application to other domains within medical imaging is a promising avenue for research.

In sum, Swin-Res-Net emerges as a robust solution for the precise segmentation of retinal vessel structures, signaling a promising direction for advancements in medical imaging technology and its application in diagnosing ophthalmological conditions.

PDF Markdown