Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

H-vmunet: High-order Vision Mamba UNet for Medical Image Segmentation (2403.13642v1)

Published 20 Mar 2024 in cs.CV

Abstract: In the field of medical image segmentation, variant models based on Convolutional Neural Networks (CNNs) and Visual Transformers (ViTs) as the base modules have been very widely developed and applied. However, CNNs are often limited in their ability to deal with long sequences of information, while the low sensitivity of ViTs to local feature information and the problem of secondary computational complexity limit their development. Recently, the emergence of state-space models (SSMs), especially 2D-selective-scan (SS2D), has had an impact on the longtime dominance of traditional CNNs and ViTs as the foundational modules of visual neural networks. In this paper, we extend the adaptability of SS2D by proposing a High-order Vision Mamba UNet (H-vmunet) for medical image segmentation. Among them, the proposed High-order 2D-selective-scan (H-SS2D) progressively reduces the introduction of redundant information during SS2D operations through higher-order interactions. In addition, the proposed Local-SS2D module improves the learning ability of local features of SS2D at each order of interaction. We conducted comparison and ablation experiments on three publicly available medical image datasets (ISIC2017, Spleen, and CVC-ClinicDB), and the results all demonstrate the strong competitiveness of H-vmunet in medical image segmentation tasks. The code is available from https://github.com/wurenkai/H-vmunet .

High-Order Vision Mamba UNet for Medical Image Segmentation

The paper "H-vmunet: High-order Vision Mamba UNet for Medical Image Segmentation" presents a novel approach to enhance the efficacy of medical image segmentation by integrating State-Space Models (SSMs) and High-order 2D-selective-scan (H-SS2D) with the UNet framework. This research addresses existing limitations in Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) for image segmentation, where CNNs struggle with long-range dependencies and ViTs have high computational complexity and memory usage.

Proposed Methodology

The core innovation of this paper is the introduction of the High-order Vision Mamba UNet (H-vmunet). This architecture incorporates H-SS2D, a novel extension of the 2D-selective-scan (SS2D) operation used to minimize redundant information while maintaining a wide receptive field. The H-SS2D operation is implemented within a High-order Visual State Space (H-VSS) module, which integrates state-space modeling techniques to ensure efficient feature extraction across multiple layers.

The H-vmunet maintains the U-shaped architecture typical of UNet models, comprising an encoder, decoder, and skip connections that preserve spatial information crucial for medical segmentation. By replacing traditional convolutional modules with the H-VSS module, the proposed model can leverage both global and local feature extraction capabilities, enhancing the segmentation of complex medical images that contain subtle lesion details.

Experimental Results

The effectiveness of H-vmunet was demonstrated through extensive experiments on three publicly available medical image datasets: ISIC2017, Spleen, and CVC-ClinicDB. The results showed a significant improvement in segmentation accuracy compared to existing state-of-the-art models, including several UNet variants and Transformer-based architectures.

Numerically, the proposed H-vmunet model outperformed others with an increase in Dice Similarity Coefficient (DSC) across all datasets, highlighting its superior ability to capture fine-grained details and effectively suppress irrelevant information. Moreover, the model demonstrated a 67.28% reduction in parameters compared to the traditional Vision Mamba UNet (VM-UNet), showcasing its efficiency in handling computational resources.

Implications and Future Work

The introduction of H-vmunet holds substantial implications for the field of medical image segmentation. By effectively balancing computational efficiency with segmentation accuracy, this work paves the way for the deployment of more responsive and resource-conscious medical image analysis applications, which are vital in real-world clinical environments.

Theoretically, this research expands the applicability of state-space models in visual processing tasks, suggesting a potential paradigm shift from traditional CNNs and ViTs to more memory-efficient models capable of handling higher-order interactions. The promising results also speculate on the future utility of such models in other image-intensive tasks beyond medical segmentation.

Future research could explore the application of H-vmunet in various medical contexts, considering diverse imaging modalities and integrating domain-specific knowledge to enhance model robustness. Additionally, investigating the integration of H-vmunet with other innovative architectures and techniques could further refine its performance and applicability. This includes exploring adaptive mechanisms for dynamically adjusting the order of spatial interactions based on the complexity and context of input images.

In conclusion, the paper provides a comprehensive exploration of advancing medical image segmentation through high-order interactions, presenting a model that is both computationally efficient and highly effective, offering a valuable contribution to the continual development of deep learning in medical contexts.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (38)
  1. Attention swin u-net: Cross-contextual attention mechanism for skin lesion segmentation. In 2023 IEEE 20th International Symposium on Biomedical Imaging (ISBI), pages 1–5. IEEE, 2023.
  2. The medical segmentation decathlon. Nature communications, 13(1):4128, 2022.
  3. Transnorm: Transformer provides a strong spatial normalization mechanism for a deep segmentation model. IEEE Access, 10:108205–108215, 2022.
  4. Wm-dova maps for accurate polyp highlighting in colonoscopy: Validation vs. saliency maps from physicians. Computerized medical imaging and graphics, 43:99–111, 2015.
  5. Swin-unet: Unet-like pure transformer for medical image segmentation. In European conference on computer vision, pages 205–218. Springer, 2022.
  6. Skin lesion analysis toward melanoma detection 2018: A challenge hosted by the international skin imaging collaboration (isic). arXiv preprint arXiv:1902.03368, 2019.
  7. Skin lesion analysis toward melanoma detection: A challenge at the 2017 international symposium on biomedical imaging (isbi), hosted by the international skin imaging collaboration (isic). In 2018 IEEE 15th international symposium on biomedical imaging (ISBI 2018), pages 168–172. IEEE, 2018.
  8. Scaling up your kernels to 31x31: Revisiting large kernel design in cnns. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11963–11975, 2022.
  9. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
  10. Mamba: Linear-time sequence modeling with selective state spaces. arXiv preprint arXiv:2312.00752, 2023.
  11. Efficiently modeling long sequences with structured state spaces. arXiv preprint arXiv:2111.00396, 2021.
  12. Combining recurrent, convolutional, and continuous-time models with linear state space layers. Advances in neural information processing systems, 34:572–585, 2021.
  13. Devil is in channels: Contrastive single domain generalization for medical image segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 14–23. Springer, 2023.
  14. Rudolph Emil Kalman. A new approach to linear filtering and prediction problems. 1960.
  15. Miccai multi-atlas labeling beyond the cranial vault–workshop and challenge. In Proc. MICCAI Multi-Atlas Labeling Beyond Cranial Vault—Workshop Challenge, volume 5, page 12, 2015.
  16. A review of deep-learning-based medical image segmentation methods. Sustainability, 13(3):1224, 2021.
  17. Vmamba: Visual state space model. arXiv preprint arXiv:2401.10166, 2024.
  18. A convnet for the 2020s. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11976–11986, 2022.
  19. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3431–3440, 2015.
  20. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101, 2017.
  21. A review on recent developments in cancer detection using machine learning and deep learning models. Biomedical Signal Processing and Control, 80:104398, 2023.
  22. Attention u-net: Learning where to look for the pancreas. arXiv preprint arXiv:1804.03999, 2018.
  23. U-net v2: Rethinking the skip connections of u-net for medical image segmentation. arXiv preprint arXiv:2311.17791, 2023.
  24. Hornet: Efficient high-order spatial interactions with recursive gated convolutions. Advances in Neural Information Processing Systems, 35:10353–10366, 2022.
  25. Global filter networks for image classification. Advances in neural information processing systems, 34:980–993, 2021.
  26. U-net: Convolutional networks for biomedical image segmentation. In Medical image computing and computer-assisted intervention–MICCAI 2015: 18th international conference, Munich, Germany, October 5-9, 2015, proceedings, part III 18, pages 234–241. Springer, 2015.
  27. Vm-unet: Vision mamba unet for medical image segmentation. arXiv preprint arXiv:2402.02491, 2024.
  28. Malunet: A multi-attention and light-weight unet for skin lesion segmentation. In 2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pages 1150–1156. IEEE, 2022.
  29. Training data-efficient image transformers & distillation through attention. In International conference on machine learning, pages 10347–10357. PMLR, 2021.
  30. Meta-unet: Multi-scale efficient transformer attention unet for fast and high-accuracy polyp segmentation. IEEE Transactions on Automation Science and Engineering, 2023.
  31. Precise yet efficient semantic calibration and refinement in convnets for real-time polyp segmentation from colonoscopy videos. In Proceedings of the AAAI conference on artificial intelligence, volume 35, pages 2916–2924, 2021.
  32. Mhorunet: High-order spatial interaction unet for skin lesion segmentation. Biomedical Signal Processing and Control, 88:105517, 2024.
  33. Automatic skin lesion segmentation based on higher-order spatial interaction model. In 2023 IEEE International Conference on Medical Artificial Intelligence (MedAI), pages 447–452. IEEE, 2023.
  34. Only positive cases: 5-fold high-order attention interaction model for skin segmentation derived classification. arXiv preprint arXiv:2311.15625, 2023.
  35. Hsh-unet: Hybrid selective high order interactive u-shaped model for automated skin lesion segmentation. Computers in Biology and Medicine, 168:107798, 2024.
  36. Transformers in medical image segmentation: A review. Biomedical Signal Processing and Control, 84:104791, 2023.
  37. Automatic polyp segmentation via multi-scale subtraction network. In Medical Image Computing and Computer Assisted Intervention–MICCAI 2021: 24th International Conference, Strasbourg, France, September 27–October 1, 2021, Proceedings, Part I 24, pages 120–130. Springer, 2021.
  38. Vision mamba: Efficient visual representation learning with bidirectional state space model. arXiv preprint arXiv:2401.09417, 2024.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Renkai Wu (4 papers)
  2. Yinghao Liu (3 papers)
  3. Pengchen Liang (10 papers)
  4. Qing Chang (23 papers)
Citations (16)
Github Logo Streamline Icon: https://streamlinehq.com

GitHub