UltraLight VM-UNet: Parallel Vision Mamba Significantly Reduces Parameters for Skin Lesion Segmentation (2403.20035v3)

Published 29 Mar 2024 in eess.IV and cs.CV

Abstract: Traditionally for improving the segmentation performance of models, most approaches prefer to use adding more complex modules. And this is not suitable for the medical field, especially for mobile medical devices, where computationally loaded models are not suitable for real clinical environments due to computational resource constraints. Recently, state-space models (SSMs), represented by Mamba, have become a strong competitor to traditional CNNs and Transformers. In this paper, we deeply explore the key elements of parameter influence in Mamba and propose an UltraLight Vision Mamba UNet (UltraLight VM-UNet) based on this. Specifically, we propose a method for processing features in parallel Vision Mamba, named PVM Layer, which achieves excellent performance with the lowest computational load while keeping the overall number of processing channels constant. We conducted comparisons and ablation experiments with several state-of-the-art lightweight models on three skin lesion public datasets and demonstrated that the UltraLight VM-UNet exhibits the same strong performance competitiveness with parameters of only 0.049M and GFLOPs of 0.060. In addition, this study deeply explores the key elements of parameter influence in Mamba, which will lay a theoretical foundation for Mamba to possibly become a new mainstream module for lightweighting in the future. The code is available from https://github.com/wurenkai/UltraLight-VM-UNet .

References (32)

Authors (4)

Renkai Wu (4 papers)
Yinghao Liu (3 papers)
Pengchen Liang (10 papers)
Qing Chang (23 papers)

Citations (17)

View on Semantic Scholar

Summary

The paper introduces a novel Parallel Vision Mamba layer that reduces model parameters to 0.049 million and computational cost to 0.060 GFLOPs while maintaining accuracy.
The methodology employs parallel VSS Blocks to distribute feature channels, effectively mitigating exponential parameter growth typical in CNNs and Transformers.
Experimental results on ISIC2017, ISIC2018, and PH² datasets show a high Dice Similarity Coefficient of 0.9091, underscoring its clinical applicability.

Overview of UltraLight VM-UNet: Parallel Vision Mamba for Skin Lesion Segmentation

The paper presents a notable contribution to the domain of medical image segmentation with the introduction of UltraLight VM-UNet, a model developed to specifically address computational constraints associated with mobile medical devices while maintaining competitive segmentation performance. The focus is on leveraging the capabilities of state-space models (SSMs), with Vision Mamba being a pivotal component, supplanting the more computationally intensive CNNs and Transformers typically utilized in this domain.

Key Innovations and Methodological Contributions

Central to the UltraLight VM-UNet is the novel Parallel Vision Mamba (PVM) Layer, designed to process deep features while drastically reducing computational load. This is achieved by distributing the feature map across multiple VSS Blocks operating in parallel, each handling a fraction of the channels, thereby maintaining the original total number of channels processed without exorbitant parameter demand. The methodology ensures substantial parameter reduction, achieving a model size of merely 0.049 million parameters and 0.060 GFLOPs, positioning UltraLight VM-UNet as the most lightweight among its contemporaries incorporating Vision Mamba.

The authors delve deeply into understanding the parameter influence within Mamba, pinpointing the channel numbers as a critical element contributing to exponential parameter growth. By analyzing the structural components of the VSS Block and SS2D module, they effectively reduce channel-related parameter overhead through parallelization.

Experimental Evaluation and Results

The UltraLight VM-UNet's effectiveness is substantiated through comprehensive experiments on three publicly available datasets—ISIC2017, ISIC2018, and PH $^2$ . The model not only outperforms several state-of-the-art lightweight models but also demonstrates performance metrics that are on par with more parameter-heavy models, highlighting its potential for clinical applicability given its efficiency. Noteworthy results include a Dice Similarity Coefficient (DSC) of 0.9091 on the ISIC2017 dataset, indicative of the model's high segmentation accuracy.

Theoretical and Practical Implications

The work posits the potential for Vision Mamba, augmented through parallelization as demonstrated, to serve as a foundational module for developing lightweight computational models in medical imaging and beyond. The emphasis on reducing memory and computational demand without compromising on performance aligns with the requirements of real-time applications in constrained environments, such as mobile and edge computing platforms for medical diagnostics.

Future Directions

Given the established efficacy of UltraLight VM-UNet, further exploration into optimizing and adapting state-space models in other areas of computer vision and beyond could prove fruitful. Additionally, expansion of the method to other image modalities or combining with existing advancements in SSMs can enhance its robustness and adaptability.

In conclusion, UltraLight VM-UNet represents a significant step towards efficient, lightweight deep learning models in medical imaging, providing a foundation for future research in optimizing SSMs for broader applications.

PDF Markdown

Related Papers

GitHub

GitHub - wurenkai/UltraLight-VM-UNet: [arXiv] The official code for "UltraLight VM-UNet: Parallel Vision Mamba Significantly Reduces Parameters for Skin Lesion Segmentation". (292 stars)