Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Vision Mamba: A Comprehensive Survey and Taxonomy (2405.04404v1)

Published 7 May 2024 in cs.CV, cs.AI, cs.CL, and cs.LG

Abstract: State Space Model (SSM) is a mathematical model used to describe and analyze the behavior of dynamic systems. This model has witnessed numerous applications in several fields, including control theory, signal processing, economics and machine learning. In the field of deep learning, state space models are used to process sequence data, such as time series analysis, NLP and video understanding. By mapping sequence data to state space, long-term dependencies in the data can be better captured. In particular, modern SSMs have shown strong representational capabilities in NLP, especially in long sequence modeling, while maintaining linear time complexity. Notably, based on the latest state-space models, Mamba merges time-varying parameters into SSMs and formulates a hardware-aware algorithm for efficient training and inference. Given its impressive efficiency and strong long-range dependency modeling capability, Mamba is expected to become a new AI architecture that may outperform Transformer. Recently, a number of works have attempted to study the potential of Mamba in various fields, such as general vision, multi-modal, medical image analysis and remote sensing image analysis, by extending Mamba from natural language domain to visual domain. To fully understand Mamba in the visual domain, we conduct a comprehensive survey and present a taxonomy study. This survey focuses on Mamba's application to a variety of visual tasks and data types, and discusses its predecessors, recent advances and far-reaching impact on a wide range of domains. Since Mamba is now on an upward trend, please actively notice us if you have new findings, and new progress on Mamba will be included in this survey in a timely manner and updated on the Mamba project at https://github.com/lx6c78/Vision-Mamba-A-Comprehensive-Survey-and-Taxonomy.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (137)
  1. M. Allan, A. A. Shvets, T. Kurmann, Z. Zhang, R. Duggal, Y.-H. Su, N. Rieke, I. Laina, N. Kalavakonda, S. Bodenstedt, L. C. García-Peraza, W. Li, V. I. Iglovikov, H. Luo, J. Yang, D. Stoyanov, L. Maier-Hein, S. Speidel, and M. Azizian, “2017 robotic instrument segmentation challenge,” arXiv, 2019.
  2. Z. L. andHanzi Mao, C. Wu, C. Feichtenhofer, T. Darrell, and S. Xie, “A convnet for the 2020s,” in CVPR, 2022, pp. 11 966–11 976.
  3. A. Arnab, M. Dehghani, G. Heigold, C. Sun, M. Lučić, and C. Schmid, “Vivit: A video vision transformer,” in ICCV, 2021, pp. 6836–6846.
  4. B. N. Patro and Vijay S. Agneeswaran, “Simba: Simplified mamba-based architecture for vision and multivariate time series,” arXiv preprint arXiv:2403.15360, 2024.
  5. J. Bai, Y. Yin, and Q. He, “Retinexmamba: Retinex-based mamba for low-light image enhancement,” arXiv preprint arXiv:2405.03349, 2024.
  6. O. Bernard, A. Lalande, C. Zotti, F. Cervenansky, X. Yang, P.-A. Heng, I. Cetin, K. Lekadir, O. Camara, M. A. Gonzalez Ballester, G. Sanroma, S. Napel, S. Petersen, G. Tziritas, E. Grinias, M. Khened, V. A. Kollerathu, G. Krishnamurthi, M.-M. Rohé, X. Pennec, M. Sermesant, F. Isensee, P. J?ger, K. H. Maier-Hein, P. M. Full, I. Wolf, S. Engelhardt, C. F. Baumgartner, L. M. Koch, J. M. Wolterink, I. I?gum, Y. Jang, Y. Hong, J. Patravali, S. Jain, O. Humbert, and P.-M. Jodoin, “Deep learning techniques for automatic mri cardiac multi-structures segmentation and diagnosis: Is the problem solved?” IEEE Transactions on Medical Imaging, vol. 37, no. 11, pp. 2514–2525, 2018.
  7. M. Berseth, “Isic 2017 - skin lesion analysis towards melanoma detection,” arXiv preprint arXiv:1703.00523, 2017.
  8. G. Bertasius, H. Wang, and L. Torresani, “Is space-time attention all you need for video understanding?” arXiv preprint arXiv:2102.05095, 2021.
  9. Y. Cai, H. Bian, J. Lin, H. Wang, R. Timofte, and Y. Zhang, “Retinexformer: One-stage retinex-based transformer for low-light image enhancement,” in ICCV, 2023, pp. 12 470–12 479.
  10. H. Cao, Y. Wang, J. Chen, D. Jiang, X. Zhang, Q. Tian, and M. Wang, “Swin-unet: Unet-like pure transformer for medical image segmentation,” in ECCV, 2021.
  11. Z. Cao, X. Wu, L.-J. Deng, Y. Zhong, and S. A. Novel, “A novel state space model with local enhancement and state sharing for image fusion,” arXiv preprint arXiv:2404.09293, 2024.
  12. C.-S. Chen, G.-Y. Chen, D. Zhou, D. Jiang, and D.-S. Chen, “Res-vmamba: Fine-grained food category visual classification using selective state space models with deep residual learning,” arXiv preprint arXiv:2402.15761, 2024.
  13. G. Chen, Y. Huang, J. Xu, B. Pei, Z. Chen, Z. Li, J. Wang, K. Li, T. Lu, and L. Wang, “Video mamba suite: State space model as a versatile alternative for video understanding,” arXiv preprint arXiv:2403.09626, 2024.
  14. H. Chen, J. Song, C. Han, J. Xia, and N. Yokoya, “Changemamba: Remote sensing change detection with spatio-temporal state space model,” arXiv preprint arXiv:2404.03425, 2024.
  15. K. Chen, B. Chen, C. Liu, W. Li, Z. Zou, and Z. Shi, “Rsmamba: Remote sensing image classification with state space model,” arXiv preprint arXiv:2404.01705, 2024.
  16. T. Chen, Z. Tan, T. Gong, Q. Chu, Y. Wu, B. Liu, J. Ye, and N. Yu, “Mim-istd: Mamba-in-mamba for efficient infrared small target detection,” arXiv preprint arXiv:2403.02148, 2024.
  17. Y. Chen, J. Xie, Y. Lin, Y. Song, W. Yang, and R. Yu, “Survmamba: State space model with multi-grained multi-modal interaction for survival prediction,” arXiv preprint arXiv:2404.08027, 2024.
  18. C. Cheng, H. Wang, and H. Sun, “Activating wider areas in image super-resolution,” arXiv preprint arXiv:2403.08330, 2024.
  19. N. C. F. Codella, V. M. Rotemberg, P. Tschandl, M. E. Celebi, S. W. Dusza, D. Gutman, B. Helba, A. Kalloo, K. Liopyris, M. A. Marchetti, H. Kittler, and A. C. Halpern, “Skin lesion analysis toward melanoma detection 2018: A challenge hosted by the international skin imaging collaboration (isic),” arXiv preprint arXiv:1902.03368, 2019.
  20. R. Deng and T. Gu, “Cu-mamba: Selective state space models with channel learning for image restoration,” arXiv preprint arXiv:2404.11778, 2024.
  21. W. Dong, H. Zhu, S. Lin, X. Luo, Y. Shen, X. Liu, J. Zhang, G. Guo, and B. Zhang, “Fusion-mamba for cross-modality object detection,” arXiv preprint arXiv:2404.09146, 2024.
  22. A. Dosovitskiy, et al., “An image is worth 16x16 words: Transformers for image recognition at scale,” in ICLR, 2021, pp. 1–16.
  23. Z. Fang, Y. Wang, Z. Wang, J. Zhang, X. Ji, and Y. Zhang, “Mammil: Multiple instance learning for whole slide images with state space models,” arXiv preprint arXiv:2403.05160, 2024.
  24. L. Fu, X. Li, X. Cai, Y. Wang, X. Wang, Y. Shen, and Y. Yao, “Md-dose: A diffusion model based on the mamba for radiotherapy dose prediction,” arXiv preprint arXiv:2403.08479, 2024.
  25. Y. Gao, J. Huang, X. Sun, Z. Jie, Y. Zhong, and L. Ma, “Matten: Video generation with mamba-attention,” arXiv preprint arXiv:2405.03025, 2024.
  26. Y. Gao, M. Zhou, D. Liu, and D. N. Metaxas, “A multi-scale transformer for medical image segmentation: Architectures, model efficiency, and benchmarks,” arXiv preprint arXiv:2203.00131, 2022.
  27. H. Gong, L. Kang, Y. Wang, X. Wan, and H. Li, “nnmamba: 3d biomedical image segmentation, classification and landmark detection with state space model,” arXiv preprint arXiv:2402.03526, 2024.
  28. A. Gu and T. Dao, “Mamba: Linear-time sequence modeling with selective state spaces,” arXiv preprint arXiv:2312.00752, 2023.
  29. A. Gu, T. Dao, S. Ermon, A. Rudra, and C. Ré, “Hippo: Recurrent memory with optimal polynomial projections,” in NeurIPS, 2020, pp. 1474–1487.
  30. A. Gu, K. Goel, and C. Ré, “Efficiently modeling long sequences with structured state spaces,” in ICLR, 2022, pp. 1–27.
  31. A. Gu, I. Johnson, K. Goel, K. Saab, T. Dao, A. Rudra, and C. Ré, “Combining recurrent, convolutional, and continuous-time models with linear state space layers,” in NeurIPS, 2021, pp. 572–585.
  32. H. Guo, J. Li, T. Dai, Z. Ouyang, X. Ren, and S.-T. Xia, “Mambair: A simple baseline for image restoration with state-space model,” arXiv preprint arXiv:2402.15648, 2024.
  33. T. Guo, Y. Wang, S. Shu, D. Chen, Z. Tang, C. Meng, and X. Bai, “Mambamorph: a mamba-based framework for medical mr-ct deformable registration,” arXiv preprint arXiv:2401.13934, 2024.
  34. A. Gupta, A. Gu, and J. Berant, “Diagonal state spaces are as effective as structured state spaces,” arXiv preprint arXiv:2203.14343, 2022.
  35. J. Hao, L. He, and K. F. Hung, “T-mamba: Frequency-enhanced gated long-range dependency for tooth 3d cbct segmentation,” arXiv preprint arXiv:2404.01065, 2024.
  36. A. Hatamizadeh, V. Nath, Y. Tang, D. Yang, H. R. Roth, and D. Xu, “Swin unetr: Swin transformers for semantic segmentation of brain tumors in mri images,” arXiv preprint arXiv:2201.01266, 2022.
  37. A. Hatamizadeh, D. Yang, H. R. Roth, and D. Xu, “Unetr: Transformers for 3d medical image segmentation,” WACV, pp. 1748–1758, 2021.
  38. H. He, Y. Bai, J. Zhang, Q. He, H. Chen, Z. Gan, C. Wang, X. Li, G. Tian, and L. Xie, “Mambaad: Exploring state space models for multi-class unsupervised anomaly detection,” arXiv preprint arXiv:2404.06564, 2024.
  39. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in CVPR, 2016, pp. 770–778.
  40. X. He, K. Cao, K. Yan, R. Li, C. Xie, J. Zhang, and M. Zhou, “Pan-mamba: Effective pan-sharpening with state space model,” arXiv preprint arXiv:2402.12192, 2024.
  41. J. Hu, L. Shen, and G. Sun, “Squeeze-and-low-shotitation networks,” in CVPR, 2018, pp. 7132–7141.
  42. V. T. Hu, S. A. Baumann, M. Gui, O. Grebenkova, P. Ma, J. Fischer, and B. Ommer, “Zigma: Zigzag mamba diffusion model,” arXiv preprint arXiv:2403.13802, 2024.
  43. Hu, Jie and Shen, Li and Sun, Gang, “Squeeze-and-excitation networks,” in CVPR, 2018, pp. 7132–7141.
  44. J. Huang, L. Yang, F. Wang, Y. Wu, Y. Nan, A. I. Aviles-Rivero, C.-B. Sch?nlieb, D. Zhang, and G. Yang, “Mambamir: An arbitrary-masked mamba for joint medical image reconstruction and uncertainty estimation,” arXiv preprint arXiv:2402.18451, 2024.
  45. J. Huang, S. Wang, S. Wang, Z. Wu, X. Wang, and B. Jiang, “Mamba-fetrack: Frame-event tracking via state space model,” arXiv preprint arXiv:2404.18174, 2024.
  46. L. Huang, Y. Chen, and X. He, “Spectral-spatial mamba for hyperspectral image classification,” arXiv, 2024.
  47. T. Huang, X. Pei, S. You, F. Wang, C. Qian, and C. Xu, “Localmamba: Visual state space model with windowed selective scan,” arXiv preprint arXiv:2403.09338, 2024.
  48. F. Isensee, P. F. Jaeger, S. A. A. Kohl, J. Petersen, and K. Maier-Hein, “nnu-net: a self-configuring method for deep learning-based biomedical image segmentation,” Nature Methods, vol. 18, pp. 203 – 211, 2020.
  49. M. M. Islam, M. Hasan, K. S. Athrey, T. Braskich, and G. Bertasius, “Efficient movie scene detection using state-space transformers,” in CVPR, 2023, pp. 18 749–18 758.
  50. Y. Ji, H. Bai, J. Yang, C. Ge, Y. Zhu, R. Zhang, Z. Li, L. Zhang, W. Ma, X. Wan, and P. Luo, “Amos: A large-scale abdominal multi-organ benchmark for versatile medical image segmentation,” arXiv preprint arXiv:2206.08023, 2022.
  51. A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in NeurIPS, 2012, pp. 1106–1114.
  52. K. Li, X. Li, Y. Wang, Y. He, Y. Wang, L. Wang, and Y. Qiao, “Videomamba: State space model for efficient video understanding,” arXiv preprint arXiv:2403.06977, 2024.
  53. S. Li, H. Singh, and A. Grover, “Mamba-nd: Selective state space modeling for multi-dimensional data,” arXiv, 2024.
  54. W. Li, X. Hong, and X. Fan, “Spikemba: Multi-modal spiking saliency mamba for temporal video grounding,” arXiv, 2024.
  55. Y. Li, W. Yang, and B. Fei, “3dmambacomplete: Exploring structured state space model for point cloud completion,” arXiv preprint arXiv:2404.07106, 2024.
  56. Y. Li, T. Cai, Y. Zhang, D. Chen, and D. Dey, “What makes convolutional models great on long sequence modeling?” arXiv preprint arXiv:2210.09298, 2022.
  57. Z. Li, H. Pan, K. Zhang, Y. Wang, and F. Yu, “Mambadfuse: A mamba-based dual-phase model for multi-modality image fusion,” arXiv preprint arXiv:2404.08406, 2024.
  58. D. Liang, X. Zhou, X. Wang, X. Zhu, W. Xu, Z. Zou, X. Ye, and X. Bai, “Pointmamba: A simple state space model for point cloud analysis,” arXiv preprint arXiv:2402.10739, 2024.
  59. W. Liao, Y. Zhu, X. Wang, C. Pan, Y. Wang, and L. Ma, “Lightm-unet: Mamba assists in lightweight unet for medical image segmentation,” arXiv preprint arXiv:2403.05246, 2024.
  60. T.-Y. Lin, M. Maire, S. Belongie, L. Bourdev, R. Girshick, J. Hays, P. Perona, D. Ramanan, C. L. Zitnick, and P. Dollár, “Microsoft coco: Common objects in context,” in ECCV, 2014, pp. 740–755.
  61. C. Liu, K. Chen, B. Chen, H. Zhang, Z. Zou, and Z. Shi, “Rscama: Remote sensing image change captioning with state space model,” arXiv preprint arXiv:2404.18895, 2024.
  62. H. Liu, K. Simonyan, and Y. Yang, “Darts: Differentiable architecture search,” in ICLR, 2019, pp. 1–13.
  63. H. Liu, C. Li, Q. Wu, and Y. J. Lee, “Visual instruction tuning,” in NeurIPS, vol. 36, 2023, pp. 34 892–34 916.
  64. J. Liu, H. Yang, H.-Y. Zhou, Y. Xi, L. Yu, Y. Yu, Y. Liang, G. Shi, S. Zhang, H. Zheng, and S. Wang, “Swin-umamba: Mamba-based unet with imagenet-based pretraining,” arXiv, 2024.
  65. J. Liu, R. Yu, Y. Wang, Y. Zheng, T. Deng, W. Ye, and H. Wang, “Point mamba: A novel point cloud backbone based on state space model with octree-based ordering strategy,” arXiv, 2024.
  66. Y. Liu, J. Xiao, Y. Guo, P. Jiang, H. Yang, and F. Wang, “Hsidmamba: Exploring bidirectional state-space models for hyperspectral denoising,” arXiv preprint arXiv:2404.09697, 2024.
  67. Y. Liu, Y. Tian, Y. Zhao, H. Yu, L. Xie, Y. Wang, Q. Ye, and Y. Liu, “Vmamba: Visual state space model,” arXiv, 2024.
  68. Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo, “Swin transformer: Hierarchical vision transformer using shifted windows,” in ICCV, 2021, pp. 9992–10 002.
  69. S. Long, Q. Zhou, X. Li, X. Lu, C. Ying, Y. Luo, L. Ma, and S. Yan, “Dgmamba: Domain generalization via generalized state space model,” arXiv preprint arXiv:2404.07794, 2024.
  70. C. Ma and Z. Wang, “Semi-mamba-unet: Pixel-level contrastive and pixel-level cross-supervised visual mamba-based unet for semi-supervised medical image segmentation,” arXiv, 2024.
  71. J. Ma, F. Li, and B. Wang, “U-mamba: Enhancing long-range dependency for biomedical image segmentation,” arXiv, 2024.
  72. J. Ma and other, “The multimodality cell segmentation challenge: toward universal solutions,” Nature Methods, 2024.
  73. X. Ma, X. Zhang, and M.-O. Pun, “Rs3mamba: Visual state space model for remote sensing images semantic segmentation,” arXiv preprint arXiv:2404.02457, 2024.
  74. D. Misra, J. Gala, A. Ai4bharat, and Orvieto, “On the low-shot transferability of [v]-mamba,” arXiv, 2024.
  75. A. Myronenko, “3d mri brain tumor segmentation using autoencoder regularization,” in BrainLes@MICCAI, 2018.
  76. M. Oquab et al., “Dinov2: Learning robust visual features without supervision,” arXiv preprint arXiv:2304.07193, 2024.
  77. A. Orvieto, S. L. Smith, A. Gu, A. Fernando, C. Gulcehre, R. Pascanu, and S. De, “Resurrecting recurrent neural networks for long sequences,” arXiv preprint arXiv:2303.06349, 2023.
  78. B. N. Patro and V. S. Agneeswaran, “Mamba-360: Survey of state space models as transformer alternative for long sequence modelling: Methods, applications, and challenges,” arXiv, 2024.
  79. S. Peng, X. Zhu, H. Deng, Z. Lei, and L.-J. Deng, “Fusionmamba: Efficient image fusion with state space model,” arXiv, 2024.
  80. C. R. Qi, H. Su, K. Mo, and L. J. Guibas, “Pointnet: Deep learning on point sets for 3d classification and segmentation,” in CVPR, 2017, pp. 77–85.
  81. Z. Qian and Z. Xiao, “Smcd: High realism motion style transfer via mamba-based diffusion,” arXiv, 2024.
  82. Y. Qiao, Z. Yu, L. Guo, S. Chen, Z. Zhao, M. Sun, Q. Wu, and J. Liu, “Vl-mamba: Exploring state space models for multimodal learning,” arXiv preprint arXiv:2403.13600, 2024.
  83. I. Radosavovic, R. P. Kosaraju, R. Girshick, K. He, and P. Dollar, “Designing network design spaces,” in CVPR, 2020.
  84. Renkai Wu and Yinghao Liu and Pengchen Liang and Qing Chang, “H-vmunet: High-order vision mamba unet for medical image segmentation,” arXiv preprint arXiv:2403.13642, 2024.
  85. O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” arXiv, 2015.
  86. J. Ruan and S. Xiang, “Vm-unet: Vision mamba unet for medical image segmentation,” arXiv preprint arXiv:2402.02491, 2024.
  87. J. Ruan, S. Xiang, M. Xie, T. Liu, and Y. Fu, “Malunet: A multi-attention and light-weight unet for skin lesion segmentation,” BIBM, pp. 1150–1156, 2022.
  88. K. S. Sanjid, M. T. Hossain, M. S. S. Junayed, and D. M. M. Uddin, “Integrating mamba sequence model and hierarchical upsampling network for accurate semantic segmentation of multiple sclerosis legion,” arXiv preprint arXiv:2403.17432, 2024.
  89. R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra, “Grad-cam: Visual explanations from deep networks via gradient-based localization,” in ICCV, 2017, pp. 618–626.
  90. Q. Shen, X. Yi, Z. Wu, P. Zhou, H. Zhang, S. Yan, and X. Wang, “Gamba: Marry gaussian splatting with mamba for single view 3d reconstruction,” arXiv preprint arXiv:2403.18795, 2024.
  91. Y. Shi, B. Xia, X. Jin, X. Wang, T. Zhao, X. Xia, X. Xiao, and W. Yang, “Vmambair: Visual state space model for image restoration,” arXiv preprint arXiv:2403.11423, 2024.
  92. M. Tan and Q. V. Le, “Efficientnet: Rethinking model scaling for convolutional neural networks,” in ICML, 2019, pp. 6105–6114.
  93. H. Tang, L. Cheng, G. Huang, Z. Tan, J. Lu, and K. Wu, “Rotate to scan: Unet-like mamba with triplet ssm module for medical image segmentation,” arXiv preprint arXiv:2403.17701, 2024.
  94. H. Touvron, M. Cord, M. Douze, F. Massa, A. Sablayrolles, and H. J’egou, “Training data-efficient image transformers & distillation through attention,” in ICML, 2020, pp. 10 347–10 357.
  95. Z. Wan, Y. Wang, S. Yong, P. Zhang, S. Stepputtis, K. P. Sycara, and Y. Xie, “Sigma: Siamese mamba network for multi-modal semantic segmentation,” arXiv preprint arXiv:2404.04256, 2024.
  96. G. Wang, X. Zhang, Z. Peng, T. Zhang, X. Jia, and L. Jiao, “S2mamba: A spatial-spectral state space model for hyperspectral image classification,” arXiv preprint arXiv:2404.18213, 2024.
  97. J. Wang, J. Chen, D. Chen, and J. Wu, “Large window-based mamba unet for medical image segmentation: Beyond convolution and self-attention,” arXiv preprint arXiv:2403.07332, 2024.
  98. Q. Wang, H. Hu, and Y. Zhou, “Memorymamba: Memory-augmented state space model for defect recognition,” arXiv preprint arXiv:2405.03673, 2024.
  99. Q. Wang, C. Wang, Z. Lai, and Y. Zhou, “Insectmamba: Insect pest classification with state space model,” arXiv, 2024.
  100. W. Wang, E. Xie, X. Li, D.-P. Fan, K. Song, D. Liang, T. Lu, P. Luo, and L. Shao, “Pvt v2: Improved baselines with pyramid vision transformer,” Computational Visual Media, pp. 415–424, 2021.
  101. X. Wang, S. Wang, Y. Ding, Y. Li, W. Wu, Y. Rong, W. Kong, J. Huang, S. Li, H. Yang, Z. Wang, B. Jiang, C. Li, Y. Wang, Y. Tian, and J. Tang, “State space model for new-generation network alternative to transformers: A survey,” arXiv, 2024.
  102. Z. Wang and C. Ma, “Weak-mamba-unet: Visual mamba makes cnn and vit work better for scribble-based medical image segmentation,” arXiv preprint arXiv:2402.10887, 2024.
  103. Z. Wang, J.-Q. Zheng, C. Ma, and T. Guo, “Vmambamorph: a multi-modality deformable image registration framework based on visual state space model with cross-scan module,” arXiv preprint arXiv:2404.05105, 2024.
  104. Z. Wang, J.-Q. Zheng, Y. Zhang, G. Cui, and L. Li, “Mamba-unet: Unet-like pure visual mamba for medical image segmentation,” arXiv preprint arXiv:2402.05079, 2024.
  105. R. Wu, Y. Liu, P. Liang, and Q. Chang, “Ultralight vm-unet: Parallel vision mamba significantly reduces parameters for skin lesion segmentation,” arXiv preprint arXiv:2403.20035, 2024.
  106. T. Xiao, Y. Liu, B. Zhou, Y. Jiang, and J. Sun, “Unified perceptual parsing for scene understanding,” in ECCV, 2018, pp. 418–434.
  107. P. Xiaohuan, T. Huang, and C. Xu, “Efficientvmamba: Atrous selective scan for light weight visual mamba,” arXiv, 2024.
  108. J. Xie, R. Liao, Z. Zhang, S. Yi, Y. Zhu, and G. Luo, “Promamba: Prompt-mamba for polyp segmentation,” arXiv, 2024.
  109. X. Xie, Y. Cui, C.-I. Ieong, T. Tan, X. Zhang, X. Zheng, and Z. Yu, “Fusionmamba: Dynamic feature enhancement for multimodal image fusion with mamba,” arXiv preprint arXiv:2404.09498, 2024.
  110. Z. Xing, T. Ye, Y. Yang, G. Liu, and L. Zhu, “Segmamba: Long-range sequential modeling mamba for 3d medical image segmentation,” arXiv preprint arXiv:2401.13560, 2024.
  111. R. Xu, S. Yang, Y. Wang, and B. Du, “A survey on vision mamba: Models, applications and challenges,” arXiv, 2024.
  112. Z. Xu, Y. Lin, H. Han, S. Yang, R. Li, Y. Zhang, and X. Li, “Mambatalk: Efficient holistic gesture synthesis with selective state space models,” arXiv preprint arXiv:2403.09471, 2024.
  113. C. Yang, Z. Chen, M. Espinosa, L. Ericsson, Z. Wang, J. Liu, and E. J. Crowley, “Plainmamba: Improving non-hierarchical mamba in visual recognition,” arXiv preprint arXiv:2403.17695, 2020.
  114. G. Yang, K. Du, Z. Yang, Y. Du, and S. W. Yongping Zheng, “Cmvim: Contrastive masked vim autoencoder for 3d multi-modal representation learning for ad classification,” arXiv, 2024.
  115. S. Yang, Y. Wang, and H. Chen, “Mambamil: Enhancing long sequence modeling with sequence reordering in computational pathology,” arXiv preprint arXiv:2403.06800, 2024.
  116. Y. Yang, Z. Xing, C. Huang, and L. Zhu, “Vivim: a video vision mamba for medical video object segmentation,” arXiv, 2024.
  117. Y. Yang, C. Ma, J. Yao, Z. Zhong, Y. Zhang, and Y. Wang, “Remamber: Referring image segmentation with mamba twister,” arXiv preprint arXiv:2403.17839, 2024.
  118. J. Yao, D. Hong, C. Li, and J. Chanussot, “Spectralmamba: Efficient mamba for hyperspectral image classification,” arXiv preprint arXiv:2404.08489, 2024.
  119. Z. Ye, T. Chen, F. Wang, H. Zhang, G. Li, and L. Zhang, “P-mamba: Marrying perona malik diffusion with mamba for efficient pediatric echocardiographic left ventricular segmentation,” arXiv preprint arXiv:2402.08506, 2024.
  120. W. Yu, M. Luo, P. Zhou, C. Si, Y. Zhou, X. Wang, J. Feng, and S. Yan, “Metaformer is actually what you need for vision,” in CVPR, 2022, pp. 10 819–10 829.
  121. Y. Yue and Z. Li, “Medmamba: Vision mamba for medical image classification,” arXiv preprint arXiv:2403.03849, 2024.
  122. T. Zhan, X. Li, H. Yuan, S. Ji, and S. Yan, “Point cloud mamba: Point cloud learning via state space model,” arXiv, 2024.
  123. H. Zhang, Y. Zhu, D. Wang, L. Zhang, T. Chen, and Z. Ye, “A survey on visual mamba,” arXiv preprint arXiv:2404.15956, 2024.
  124. M. Zhang, Y. Yu, L. Gu, T. Lin, and X. Tao, “Vm-unet-v2 rethinking vision mamba unet for medical image segmentation,” arXiv preprint arXiv:2403.09157, 2024.
  125. Y. Zhang, W. Yan, K. Yan, C. Lam, Y. Qiu, P. Zheng, R. Tang, and S. Cheng, “Motion-guided dual-camera tracker for low-cost skill evaluation of gastric endoscopy,” arXiv, 2024.
  126. Y. Zhang, H. Liu, and Q. Hu, “Transfuse: Fusing transformers and cnns for medical image segmentation,” arXiv, 2021.
  127. Z. Zhang, A. Liu, I. Reid, R. Hartley, B. Zhuang, and H. Tang, “Motion mamba: Efficient and long sequence motion generation with hierarchical and bidirectional selective ssm,” arXiv, 2024.
  128. H. Zhao, M. Zhang, W. Zhao, P. Ding, S. Huang, and D. Wang, “Cobra: Extending mamba to multi-modal large language model for efficient inference,” arXiv preprint arXiv:2403.14520, 2024.
  129. S. Zhao, H. Chen, X.-l. Zhang, P. Xiao, L. Bai, and W. Ouyang, “Rs-mamba for large remote sensing image dense prediction,” arXiv preprint arXiv:2404.02668, 2024.
  130. Z. Zhen, Y. Hu, and Z. Feng, “Freqmamba: Viewing mamba from a frequency perspective for image deraining,” arXiv, 2024.
  131. Z. Zheng and C. Wu, “U-shaped vision mamba for single image dehazing,” arXiv preprint arXiv:2402.04139, 2024.
  132. Z. Zheng and J. Zhang, “Fd-vision mamba for endoscopic exposure correction,” arXiv preprint arXiv:2402.06378, 2024.
  133. H. X. T. S. A. A. Zhou, BoleiZhao, “Semantic understanding of scenes through the ade20k dataset,” International Journal of Computer Vision, pp. 302–321, 2019.
  134. H.-Y. Zhou, J. Guo, Y. Zhang, L. Yu, L. Wang, and Y. Yu, “nnformer: Volumetric medical image segmentation via a 3d transformer,” IEEE TIP, vol. 32, pp. 4036–4045, 2021.
  135. L. Zhu, B. Liao, Q. Zhang, X. Wang, W. Liu, and X. Wang, “Vision mamba: Efficient visual representation learning with bidirectional state space model,” arXiv, 2024.
  136. Q. Zhu, Y. Cai, Y. Fang, Y. Yang, C. Chen, L. Fan, and A. Nguyen, “Samba: Semantic segmentation of remotely sensed images with state space model,” arXiv preprint arXiv:2404.01705, 2024.
  137. B. Zou, Z. Guo, X. Hu, and H. Ma, “Rhythmmamba: Fast remote physiological measurement with arbitrary length videos,” arXiv preprint arXiv:2404.06483, 2024.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Xiao Liu (402 papers)
  2. Chenxu Zhang (16 papers)
  3. Lei Zhang (1689 papers)
Citations (12)