MSRF-Net: A Multi-Scale Residual Fusion Network for Biomedical Image Segmentation (2105.07451v2)

Published 16 May 2021 in eess.IV and cs.CV

Abstract: Methods based on convolutional neural networks have improved the performance of biomedical image segmentation. However, most of these methods cannot efficiently segment objects of variable sizes and train on small and biased datasets, which are common for biomedical use cases. While methods exist that incorporate multi-scale fusion approaches to address the challenges arising with variable sizes, they usually use complex models that are more suitable for general semantic segmentation problems. In this paper, we propose a novel architecture called Multi-Scale Residual Fusion Network (MSRF-Net), which is specially designed for medical image segmentation. The proposed MSRF-Net is able to exchange multi-scale features of varying receptive fields using a Dual-Scale Dense Fusion (DSDF) block. Our DSDF block can exchange information rigorously across two different resolution scales, and our MSRF sub-network uses multiple DSDF blocks in sequence to perform multi-scale fusion. This allows the preservation of resolution, improved information flow and propagation of both high- and low-level features to obtain accurate segmentation maps. The proposed MSRF-Net allows to capture object variabilities and provides improved results on different biomedical datasets. Extensive experiments on MSRF-Net demonstrate that the proposed method outperforms the cutting-edge medical image segmentation methods on four publicly available datasets. We achieve the dice coefficient of 0.9217, 0.9420, and 0.9224, 0.8824 on Kvasir-SEG, CVC-ClinicDB, 2018 Data Science Bowl dataset, and ISIC-2018 skin lesion segmentation challenge dataset respectively. We further conducted generalizability tests and achieved a dice coefficient of 0.7921 and 0.7575 on CVC-ClinicDB and Kvasir-SEG, respectively.

Citations (169)

View on Semantic Scholar

Summary

The paper introduces MSRF-Net, a multi-scale residual fusion network featuring Dual-Scale Dense Fusion blocks to enhance biomedical image segmentation by effectively utilizing multi-scale features.
Key architectural innovations include a gated shape stream for improved boundary detection and a triple attention mechanism in the decoder to refine segmentation outputs.
Evaluated on four public datasets, MSRF-Net demonstrated superior performance over state-of-the-art models, achieving high Dice Coefficients like 0.9217 on Kvasir-SEG and 0.9420 on CVC-ClinicDB.

Overview of MSRF-Net: A Multi-Scale Residual Fusion Network for Biomedical Image Segmentation

The paper "MSRF-Net: A Multi-Scale Residual Fusion Network for Biomedical Image Segmentation" introduces a novel deep learning architecture designed to enhance the segmentation accuracy of biomedical images, which traditionally suffer from significant challenges such as variable object sizes and small, biased datasets. The underlying innovation in MSRF-Net is its ability to robustly exchange and utilize multi-scale features using a component referred to as the Dual-Scale Dense Fusion (DSDF) block. The paper details how the integration of DSDF blocks within the Multi-Scale Residual Fusion (MSRF) sub-network contributes to the improved performance of the proposed model.

Core Concepts and Architecture

Dual-Scale Dense Fusion (DSDF) Block:

The DSDF block forms the backbone of the MSRF-Net, enabling effective multi-scale feature exchange by connecting convolutional layers across different resolution streams. The residual dense nature of DSDF facilitates thorough information propagation, ensuring both high- and low-level features are preserved throughout the network. The DSDF block supports efficient learning and grants the model the ability to adaptively capture and leverage spatial variabilities within biomedical images.

MSRF Sub-network:

The MSRF sub-network is composed of a series of DSDF blocks configured to perform extensive feature fusion across multiple scales, thereby enhancing the high-resolution feature representations necessary for precise segmentation. This architecture supports the preservation of essential details across various scales, providing a more detailed semantic understanding of images and enabling accurate boundary detection.

Shape Stream and Triple Attention Mechanism:

A novel addition to the proposed architecture is the gated shape stream, which benefits from the refined feature extraction capabilities of the DSDF blocks to improve the delineation of object shapes and boundaries. Additionally, the model incorporates a triple attention mechanism within its decoder component. This mechanism emphasizes relevant spatial features and suppresses irrelevant ones, further refining the segmentation outputs.

Results and Evaluation

The experimental evaluation of MSRF-Net was conducted on four publicly available medical image datasets, showcasing superior performance over existing state-of-the-art (SOTA) segmentation models. Notably, the model achieved Dice Coefficients (DSC) of 0.9217 on the Kvasir-SEG dataset, 0.9420 on CVC-ClinicDB, 0.9224 on the 2018 Data Science Bowl dataset, and 0.8824 on the ISIC-2018 dataset, indicating its robustness and improved accuracy across diverse segmentation tasks. The model also demonstrated strong generalization capabilities, performing efficiently across differing datasets and imaging protocols.

Implications and Future Directions

The advancements presented in MSRF-Net have practical implications for clinical diagnostics, as enhancing the precision and reliability of segmentation models directly supports improved disease detection and treatment analytics. The architecture holds promise for augmenting automated systems in medical imaging, reducing reliance on extensive annotated data, and potentially facilitating more personalized patient care.

Theoretically, the success of MSRF-Net underscores the value of densely connected networks equipped with versatile fusion strategies and implies that further exploration in hierarchical feature extraction could yield additional breakthroughs in medical image analysis.

Future research may build upon this work by exploring adaptations of MSRF-Net to other modalities and expanding its applications across new medical tasks. Additionally, enhancements in computational efficiency and further integration of attention mechanisms could push the boundaries of current segmentation capabilities, fostering advancements in real-time medical diagnostics and intervention planning.