MedMamba: Vision Mamba for Medical Image Classification (2403.03849v5)
Abstract: Since the era of deep learning, convolutional neural networks (CNNs) and vision transformers (ViTs) have been extensively studied and widely used in medical image classification tasks. Unfortunately, CNN's limitations in modeling long-range dependencies result in poor classification performances. In contrast, ViTs are hampered by the quadratic computational complexity of their self-attention mechanism, making them difficult to deploy in real-world settings with limited computational resources. Recent studies have shown that state space models (SSMs) represented by Mamba can effectively model long-range dependencies while maintaining linear computational complexity. Inspired by it, we proposed MedMamba, the first Vision Mamba for generalized medical image classification. Concretely, we introduced a novel hybrid basic block named SS-Conv-SSM, which purely integrates the convolutional layers for extracting local features with the abilities of SSM to capture long-range dependencies, aiming to model medical images from different image modalities efficiently. By employing the grouped convolution strategy and channel-shuffle operation, MedMamba successfully provides fewer model parameters and a lower computational burden for efficient applications without sacrificing accuracy. We thoroughly evaluated MedMamba using 16 datasets containing ten imaging modalities and 411,007 images. Experimental results show that MedMamba demonstrates competitive performance on most tasks compared with the state-of-the-art methods. This work aims to explore the potential of Vision Mamba and establish a new baseline for medical image classification, thereby providing valuable insights for developing more powerful Mamba-based artificial intelligence algorithms and applications in medicine. The source codes and all pre-trained weights of MedMamba are available at https://github.com/YubiaoYue/MedMamba.
- A review of deep learning on medical image analysis. Mobile Networks and Applications, 26:351–380, 2021.
- A review on deep learning in medical image analysis. International Journal of Multimedia Information Retrieval, 11(1):19–38, 2022.
- A review of the application of deep learning in medical image classification and segmentation. Annals of translational medicine, 8(11), 2020.
- Modern diagnostic imaging technique applications and risk factors in the medical field: A review. BioMed Research International, 2022, 2022.
- Recent advances and clinical applications of deep learning in medical image analysis. Medical Image Analysis, 79:102444, 2022.
- A comprehensive study of vision transformers in image classification tasks. arXiv preprint arXiv:2312.01232, 2023.
- Medical image classification using deep learning. Deep learning in healthcare: paradigms and applications, pages 33–51, 2020.
- Colornet: Investigating the importance of color spaces for image classification. In Computer Vision–ACCV 2018: 14th Asian Conference on Computer Vision, Perth, Australia, December 2–6, 2018, Revised Selected Papers, Part IV 14, pages 581–596. Springer, 2019.
- You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 779–788, 2016.
- Mask r-cnn. In Proceedings of the IEEE international conference on computer vision, pages 2961–2969, 2017.
- U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18, pages 234–241. Springer, 2015.
- Early detection of visual impairment in young children using a smartphone-based deep learning system. Nature medicine, 29(2):493–503, 2023.
- End-to-end prognostication in colorectal cancer by deep learning: a retrospective, multicentre study. The Lancet Digital Health, 6(1):e33–e43, 2024.
- Smartphone-based artificial intelligence using a transfer learning algorithm for the detection and diagnosis of middle ear diseases: A retrospective deep learning study. EClinicalMedicine, 51, 2022.
- A deep learning system for predicting time to progression of diabetic retinopathy. Nature Medicine, pages 1–11, 2024.
- Preventing corneal blindness caused by keratitis using artificial intelligence. Nature communications, 12(1):3738, 2021.
- An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
- Vision mamba: Efficient visual representation learning with bidirectional state space model. arXiv preprint arXiv:2401.09417, 2024.
- Rudolph Emil Kalman. A new approach to linear filtering and prediction problems. 1960.
- Mamba: Linear-time sequence modeling with selective state spaces. arXiv preprint arXiv:2312.00752, 2023.
- Efficiently modeling long sequences with structured state spaces. arXiv preprint arXiv:2111.00396, 2021.
- Vmamba: Visual state space model. arXiv preprint arXiv:2401.10166, 2024.
- U-mamba: Enhancing long-range dependency for biomedical image segmentation. arXiv preprint arXiv:2401.04722, 2024.
- Vm-unet: Vision mamba unet for medical image segmentation. arXiv preprint arXiv:2402.02491, 2024.
- Layer normalization. arXiv preprint arXiv:1607.06450, 2016.
- Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural networks, 107:3–11, 2018.
- Evaluation of deep convolutional neural networks for automatic classification of common maternal fetal ultrasound planes. Scientific Reports, 10(1):10200, 2020.
- Cheximagenet: a novel architecture for accurate classification of covid-19 with chest x-ray digital images using deep convolutional neural networks. Health and Technology, 12(1):193–204, 2022.
- Litecovidnet: A lightweight deep neural network model for detection of covid-19 using x-ray images. International Journal of Imaging Systems and Technology, 32(5):1464–1480, 2022.
- Kvasir: A multi-class image dataset for computer aided gastrointestinal disease detection. In Proceedings of the 8th ACM on Multimedia Systems Conference, pages 164–169, 2017.
- Efficient and accurate identification of ear diseases using an ensemble deep learning model. Scientific Reports, 11(1):10839, 2021.
- Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009.
- Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
- A convnet for the 2020s. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11976–11986, 2022.
- Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF international conference on computer vision, pages 10012–10022, 2021.
- Searching for mobilenetv3. In Proceedings of the IEEE/CVF international conference on computer vision, pages 1314–1324, 2019.
- Efficientnetv2: Smaller models and faster training. In International conference on machine learning, pages 10096–10106. PMLR, 2021.
- Yubiao Yue (9 papers)
- Zhenzhang Li (9 papers)