Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

UltraLight VM-UNet: Parallel Vision Mamba Significantly Reduces Parameters for Skin Lesion Segmentation (2403.20035v3)

Published 29 Mar 2024 in eess.IV and cs.CV

Abstract: Traditionally for improving the segmentation performance of models, most approaches prefer to use adding more complex modules. And this is not suitable for the medical field, especially for mobile medical devices, where computationally loaded models are not suitable for real clinical environments due to computational resource constraints. Recently, state-space models (SSMs), represented by Mamba, have become a strong competitor to traditional CNNs and Transformers. In this paper, we deeply explore the key elements of parameter influence in Mamba and propose an UltraLight Vision Mamba UNet (UltraLight VM-UNet) based on this. Specifically, we propose a method for processing features in parallel Vision Mamba, named PVM Layer, which achieves excellent performance with the lowest computational load while keeping the overall number of processing channels constant. We conducted comparisons and ablation experiments with several state-of-the-art lightweight models on three skin lesion public datasets and demonstrated that the UltraLight VM-UNet exhibits the same strong performance competitiveness with parameters of only 0.049M and GFLOPs of 0.060. In addition, this study deeply explores the key elements of parameter influence in Mamba, which will lay a theoretical foundation for Mamba to possibly become a new mainstream module for lightweighting in the future. The code is available from https://github.com/wurenkai/UltraLight-VM-UNet .

Definition Search Book Streamline Icon: https://streamlinehq.com
References (32)
  1. Attention swin u-net: Cross-contextual attention mechanism for skin lesion segmentation. In 2023 IEEE 20th International Symposium on Biomedical Imaging (ISBI), pages 1–5. IEEE, 2023.
  2. Swin-unet: Unet-like pure transformer for medical image segmentation. In European conference on computer vision, pages 205–218. Springer, 2022.
  3. Skin lesion analysis toward melanoma detection 2018: A challenge hosted by the international skin imaging collaboration (isic). arXiv preprint arXiv:1902.03368, 2019.
  4. Skin lesion analysis toward melanoma detection: A challenge at the 2017 international symposium on biomedical imaging (isbi), hosted by the international skin imaging collaboration (isic). In 2018 IEEE 15th international symposium on biomedical imaging (ISBI 2018), pages 168–172. IEEE, 2018.
  5. Scaling up your kernels to 31x31: Revisiting large kernel design in cnns. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11963–11975, 2022.
  6. Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural networks, 107:3–11, 2018.
  7. Mamba: Linear-time sequence modeling with selective state spaces. arXiv preprint arXiv:2312.00752, 2023.
  8. Swin unetr: Swin transformers for semantic segmentation of brain tumors in mri images. In International MICCAI Brainlesion Workshop, pages 272–284. Springer, 2021.
  9. Unetr: Transformers for 3d medical image segmentation. In Proceedings of the IEEE/CVF winter conference on applications of computer vision, pages 574–584, 2022.
  10. Devil is in channels: Contrastive single domain generalization for medical image segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 14–23. Springer, 2023.
  11. Lightm-unet: Mamba assists in lightweight unet for medical image segmentation. arXiv preprint arXiv:2403.05246, 2024.
  12. A review of deep-learning-based medical image segmentation methods. Sustainability, 13(3):1224, 2021.
  13. Vmamba: Visual state space model. arXiv preprint arXiv:2401.10166, 2024.
  14. A convnet for the 2020s. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11976–11986, 2022.
  15. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3431–3440, 2015.
  16. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101, 2017.
  17. U-mamba: Enhancing long-range dependency for biomedical image segmentation. arXiv preprint arXiv:2401.04722, 2024.
  18. A review on recent developments in cancer detection using machine learning and deep learning models. Biomedical Signal Processing and Control, 80:104398, 2023.
  19. Ph 2-a dermoscopic image database for research and benchmarking. In 2013 35th annual international conference of the IEEE engineering in medicine and biology society (EMBC), pages 5437–5440. IEEE, 2013.
  20. Hornet: Efficient high-order spatial interactions with recursive gated convolutions. Advances in Neural Information Processing Systems, 35:10353–10366, 2022.
  21. Global filter networks for image classification. Advances in neural information processing systems, 34:980–993, 2021.
  22. U-net: Convolutional networks for biomedical image segmentation. In Medical image computing and computer-assisted intervention–MICCAI 2015: 18th international conference, Munich, Germany, October 5-9, 2015, proceedings, part III 18, pages 234–241. Springer, 2015.
  23. Vm-unet: Vision mamba unet for medical image segmentation. arXiv preprint arXiv:2402.02491, 2024.
  24. Malunet: A multi-attention and light-weight unet for skin lesion segmentation. In 2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pages 1150–1156. IEEE, 2022.
  25. Ege-unet: an efficient group enhanced unet for skin lesion segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 481–490. Springer, 2023.
  26. Cancer statistics, 2022. CA: a Cancer Journal for Clinicians, 72(1):7–33, 2022.
  27. Precise yet efficient semantic calibration and refinement in convnets for real-time polyp segmentation from colonoscopy videos. In Proceedings of the AAAI conference on artificial intelligence, volume 35, pages 2916–2924, 2021.
  28. Mhorunet: High-order spatial interaction unet for skin lesion segmentation. Biomedical Signal Processing and Control, 88:105517, 2024.
  29. H-vmunet: High-order vision mamba unet for medical image segmentation. arXiv preprint arXiv:2403.13642, 2024.
  30. Hsh-unet: Hybrid selective high order interactive u-shaped model for automated skin lesion segmentation. Computers in Biology and Medicine, 168:107798, 2024.
  31. Transformers in medical image segmentation: A review. Biomedical Signal Processing and Control, 84:104791, 2023.
  32. Vision mamba: Efficient visual representation learning with bidirectional state space model. arXiv preprint arXiv:2401.09417, 2024.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Renkai Wu (4 papers)
  2. Yinghao Liu (3 papers)
  3. Pengchen Liang (10 papers)
  4. Qing Chang (23 papers)
Citations (17)

Summary

  • The paper introduces a novel Parallel Vision Mamba layer that reduces model parameters to 0.049 million and computational cost to 0.060 GFLOPs while maintaining accuracy.
  • The methodology employs parallel VSS Blocks to distribute feature channels, effectively mitigating exponential parameter growth typical in CNNs and Transformers.
  • Experimental results on ISIC2017, ISIC2018, and PH² datasets show a high Dice Similarity Coefficient of 0.9091, underscoring its clinical applicability.

Overview of UltraLight VM-UNet: Parallel Vision Mamba for Skin Lesion Segmentation

The paper presents a notable contribution to the domain of medical image segmentation with the introduction of UltraLight VM-UNet, a model developed to specifically address computational constraints associated with mobile medical devices while maintaining competitive segmentation performance. The focus is on leveraging the capabilities of state-space models (SSMs), with Vision Mamba being a pivotal component, supplanting the more computationally intensive CNNs and Transformers typically utilized in this domain.

Key Innovations and Methodological Contributions

Central to the UltraLight VM-UNet is the novel Parallel Vision Mamba (PVM) Layer, designed to process deep features while drastically reducing computational load. This is achieved by distributing the feature map across multiple VSS Blocks operating in parallel, each handling a fraction of the channels, thereby maintaining the original total number of channels processed without exorbitant parameter demand. The methodology ensures substantial parameter reduction, achieving a model size of merely 0.049 million parameters and 0.060 GFLOPs, positioning UltraLight VM-UNet as the most lightweight among its contemporaries incorporating Vision Mamba.

The authors delve deeply into understanding the parameter influence within Mamba, pinpointing the channel numbers as a critical element contributing to exponential parameter growth. By analyzing the structural components of the VSS Block and SS2D module, they effectively reduce channel-related parameter overhead through parallelization.

Experimental Evaluation and Results

The UltraLight VM-UNet's effectiveness is substantiated through comprehensive experiments on three publicly available datasets—ISIC2017, ISIC2018, and PH2^2. The model not only outperforms several state-of-the-art lightweight models but also demonstrates performance metrics that are on par with more parameter-heavy models, highlighting its potential for clinical applicability given its efficiency. Noteworthy results include a Dice Similarity Coefficient (DSC) of 0.9091 on the ISIC2017 dataset, indicative of the model's high segmentation accuracy.

Theoretical and Practical Implications

The work posits the potential for Vision Mamba, augmented through parallelization as demonstrated, to serve as a foundational module for developing lightweight computational models in medical imaging and beyond. The emphasis on reducing memory and computational demand without compromising on performance aligns with the requirements of real-time applications in constrained environments, such as mobile and edge computing platforms for medical diagnostics.

Future Directions

Given the established efficacy of UltraLight VM-UNet, further exploration into optimizing and adapting state-space models in other areas of computer vision and beyond could prove fruitful. Additionally, expansion of the method to other image modalities or combining with existing advancements in SSMs can enhance its robustness and adaptability.

In conclusion, UltraLight VM-UNet represents a significant step towards efficient, lightweight deep learning models in medical imaging, providing a foundation for future research in optimizing SSMs for broader applications.