State Space Model for New-Generation Network Alternative to Transformers: A Survey (2404.09516v1)
Abstract: In the post-deep learning era, the Transformer architecture has demonstrated its powerful performance across pre-trained big models and various downstream tasks. However, the enormous computational demands of this architecture have deterred many researchers. To further reduce the complexity of attention models, numerous efforts have been made to design more efficient methods. Among them, the State Space Model (SSM), as a possible replacement for the self-attention based Transformer model, has drawn more and more attention in recent years. In this paper, we give the first comprehensive review of these works and also provide experimental comparisons and analysis to better demonstrate the features and advantages of SSM. Specifically, we first give a detailed description of principles to help the readers quickly capture the key ideas of SSM. After that, we dive into the reviews of existing SSMs and their various applications, including natural language processing, computer vision, graph, multi-modal and multi-media, point cloud/event stream, time series data, and other domains. In addition, we give statistical comparisons and analysis of these models and hope it helps the readers to understand the effectiveness of different structures on various tasks. Then, we propose possible research points in this direction to better promote the development of the theoretical model and application of SSM. More related works will be continuously updated on the following GitHub: https://github.com/Event-AHU/Mamba_State_Space_Model_Paper_List.
- A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Proceedings of Advances in Neural Information Processing Systems, 2012.
- J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2009, pp. 248–255.
- C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2015, pp. 1–9.
- K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2016, pp. 770–778.
- X. Wang, S. Zheng, R. Yang, A. Zheng, Z. Chen, J. Tang, and B. Luo, “Pedestrian attribute recognition: A survey,” Pattern Recognition, vol. 121, p. 108220, 2022.
- D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot et al., “Mastering the game of go with deep neural networks and tree search,” Nature, vol. 529, no. 7587, pp. 484–489, 2016.
- S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Computation, vol. 9, no. 8, pp. 1735–1780, 1997.
- K. Cho, B. van Merriënboer, Ç. Gu̇lçehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio, “Learning phrase representations using rnn encoder–decoder for statistical machine translation,” in Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2014, pp. 1724–1734.
- P. Velickovic, G. Cucurull, A. Casanova, A. Romero, P. Lio’, and Y. Bengio, “Graph attention networks,” arXiv preprint arXiv:1710.10903, 2017.
- Z. Wu, S. Pan, F. Chen, G. Long, C. Zhang, and S. Y. Philip, “A comprehensive survey on graph neural networks,” IEEE Transactions on Neural Networks and Learning Systems, vol. 32, no. 1, pp. 4–24, 2020.
- A. Gu and T. Dao, “Mamba: Linear-time sequence modeling with selective state spaces,” arXiv preprint arXiv:2312.00752, 2023.
- A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” in Proceedings of Advances in Neural Information Processing Systems, 2017.
- B. Min, H. Ross, E. Sulem, A. P. B. Veyseh, T. H. Nguyen, O. Sainz, E. Agirre, I. Heintz, and D. Roth, “Recent advances in natural language processing via large pre-trained language models: A survey,” ACM Computing Surveys, vol. 56, no. 2, pp. 1–40, 2023.
- J. D. M.-W. C. Kenton and L. K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” in Proceedings of Annual Conference of the North American Chapter of the Association for Computational Linguistics, 2019, pp. 4171–4186.
- Y. Sun, S. Wang, S. Feng, S. Ding, C. Pang, J. Shang, J. Liu, X. Chen, Y. Zhao, Y. Lu, W. Liu, Z. Wu, W. Gong, J. Liang, Z. Shang, P. Sun, W. Liu, O. Xuan, D. Yu, H. Tian, H. Wu, and H. Wang, “Ernie 3.0: Large-scale knowledge enhanced pre-training for language understanding and generation,” arXiv preprint arXiv:2107.02137, 2021.
- M. Lewis, Y. Liu, N. Goyal, M. Ghazvininejad, A. Mohamed, O. Levy, V. Stoyanov, and L. Zettlemoyer, “Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension,” in Proceedings of the Annual Meeting of the Association for Computational Linguistics, 2020, pp. 7871–7880.
- J. Achiam, S. Adler, S. Agarwal, L. Ahmad, I. Akkaya, F. L. Aleman, D. Almeida, J. Altenschmidt, S. Altman, S. Anadkat et al., “Gpt-4 technical report,” arXiv preprint arXiv:2303.08774, 2023.
- A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, “An image is worth 16x16 words: Transformers for image recognition at scale,” arXiv preprint arXiv:2010.11929, 2020.
- Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo, “Swin transformer: Hierarchical vision transformer using shifted windows,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 10 012–10 022.
- X. Wang, G. Chen, G. Qian, P. Gao, X.-Y. Wei, Y. Wang, Y. Tian, and W. Gao, “Large-scale multi-modal pre-trained models: A comprehensive survey,” Machine Intelligence Research, vol. 20, no. 4, pp. 447–482, 2023.
- A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark et al., “Learning transferable visual models from natural language supervision,” in Proceedings of the International Conference on Machine Learning, 2021, pp. 8748–8763.
- Y. Xin, S. Luo, H. Zhou, J. Du, X. Liu, Y. Fan, Q. Li, and Y. Du, “Parameter-efficient fine-tuning for pre-trained vision models: A survey,” arXiv preprint arXiv:2402.02242, 2024.
- H. Ren, H. Dai, Z. Dai, M. Yang, J. Leskovec, D. Schuurmans, and B. Dai, “Combiner: Full attention transformer with sparse computation cost,” in Proceedings of Advances in Neural Information Processing Systems, 2021, pp. 22 470–22 482.
- D. Han, X. Pan, Y. Han, S. Song, and G. Huang, “Flatten transformer: Vision transformer using focused linear attention,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 5961–5971.
- A. Katharopoulos, A. Vyas, N. Pappas, and F. Fleuret, “Transformers are rnns: Fast autoregressive transformers with linear attention,” in Proceedings of the International Conference on Machine Learning, 2020, pp. 5156–5165.
- K. Choromanski, V. Likhosherstov, D. Dohan, X. Song, A. Gane, T. Sarlós, P. Hawkins, J. Davis, A. Mohiuddin, L. Kaiser, D. Belanger, L. J. Colwell, and A. Weller, “Rethinking attention with performers,” arXiv preprint arXiv:2009.14794, 2020.
- S. Yang, B. Wang, Y. Shen, R. Panda, and Y. Kim, “Gated linear attention transformers with hardware-efficient training,” arXiv preprint arXiv:2312.06635, 2023.
- A. Gu, K. Goel, and C. R’e, “Efficiently modeling long sequences with structured state spaces,” arXiv preprint arXiv:2111.00396, 2021.
- E. Nguyen, K. Goel, A. Gu, G. Downs, P. Shah, T. Dao, S. Baccus, and C. Ré, “S4nd: Modeling images and videos as multidimensional signals with state spaces,” in Proceedings of Advances in Neural Information Processing Systems, 2022, pp. 2846–2861.
- J. N. Yan, J. Gu, and A. M. Rush, “Diffusion models without attention,” arXiv preprint arXiv:2311.18257, 2023.
- V. T. Hu, S. A. Baumann, M.-S. Gui, O. Grebenkova, P. Ma, J. S. Fischer, and B. Ommer, “Zigma: A dit-style zigzag mamba diffusion model,” arXiv preprint arXiv:2403.13802, 2024.
- Z. Fei, M. Fan, C. Yu, and J. Huang, “Scalable diffusion models with state space backbone,” arXiv preprint arXiv:2402.05608, 2024.
- A. Gu, I. Johnson, K. Goel, K. Saab, T. Dao, A. Rudra, and C. Ré, “Combining recurrent, convolutional, and continuous-time models with linear state space layers,” in Proceedings of Advances in Neural Information Processing Systems, 2021, pp. 572–585.
- R. E. Kalman, “A new approach to linear filtering and prediction problems,” Journal of Basic Engineering, vol. 82, no. 1, pp. 35–45, 1960.
- A. Gu, T. Dao, S. Ermon, A. Rudra, and C. Ré, “Hippo: Recurrent memory with optimal polynomial projections,” arXiv preprint arXiv:2008.07669, 2020.
- A. Gu, K. Goel, A. Gupta, and C. Ré, “On the parameterization and initialization of diagonal state space models,” in Proceedings of Advances in Neural Information Processing Systems, 2022, pp. 35 971–35 983.
- A. Gupta, A. Gu, and J. Berant, “Diagonal state spaces are as effective as structured state spaces,” Proceedings of Advances in Neural Information Processing Systems, pp. 22 982–22 994, 2022.
- A. Orvieto, S. L. Smith, A. Gu, A. Fernando, C. Gulcehre, R. Pascanu, and S. De, “Resurrecting recurrent neural networks for long sequences,” in Proceedings of the International Conference on Machine Learning, 2023, pp. 26 670–26 698.
- A. Gu, I. Johnson, A. Timalsina, A. Rudra, and C. Ré, “How to train your hippo: State space models with generalized orthogonal basis projections,” arXiv preprint arXiv:2206.120370, 2022.
- H. Mehta, A. Gupta, A. Cutkosky, and B. Neyshabur, “Long range language modeling via gated state spaces,” arXiv preprint arXiv:2206.13947, 2022.
- Y. Du, X. Liu, and Y. Chua, “Spiking structured state space model for monaural speech enhancement,” arXiv preprint arXiv:2309.03641, 2023.
- X. Jiang, C. Han, and N. Mesgarani, “Dual-path mamba: Short and long-term bidirectional selective structured state space models for speech separation,” arXiv preprint arXiv:2403.18257, 2024.
- K. Li and G. Chen, “Spmamba: State-space model is all you need in speech separation,” arXiv preprint arXiv:2403.02063, 2024.
- R. Grazzi, J. Siems, S. Schrodi, T. Brox, and F. Hutter, “Is mamba capable of in-context learning?” arXiv preprint arXiv:2402.03170, 2024.
- B. Qi, J. Gao, D. Li, K. Zhang, J. Liu, L. Wu, and B. Zhou, “S4++: Elevating long sequence modeling with state memory reply,” 2024. [Online]. Available: https://openreview.net/forum?id=bdnw4qjfH9
- S. Zuo, X. Liu, J. Jiao, D. X. Charles, E. Manavoglu, T. Zhao, and J. Gao, “Efficient long sequence modeling via state space augmented transformer,” arXiv preprint arXiv:2212.08136, 2022.
- W. He, K. Han, Y. Tang, C. Wang, Y. Yang, T. Guo, and Y. Wang, “Densemamba: State space models with dense hidden connection for efficient large language models,” arXiv preprint arXiv:2403.00818, 2024.
- Z. Yang, A. Mitra, S. Kwon, and H. Yu, “Clinicalmamba: A generative clinical language model on longitudinal clinical notes,” arXiv preprint arXiv:2403.05795, 2024.
- A. R. de Sousa Porfírio Correia and L. A. Alexandre, “Music to dance as language translation using sequence models,” arXiv preprint arXiv:2403.15569, 2024.
- C. Wang, O. Tsepa, J. Ma, and B. Wang, “Graph-mamba: Towards long-range graph sequence modeling with selective state spaces,” arXiv preprint arXiv:2402.00789, 2024.
- S. Tang, J. A. Dunnmon, Q. Liangqiong, K. K. Saab, T. Baykaner, C. Lee-Messer, and D. L. Rubin, “Modeling multivariate biosignals with graph neural networks and structured state space models,” in Proceddings of the International Conference on Learning Representations Workshops, 2023.
- A. Behrouz and F. Hashemi, “Graph mamba: Towards learning on graphs with state space models,” arXiv preprint arXiv:2402.08678, 2024.
- G. Bachmann and V. Nagarajan, “The pitfalls of next-token prediction,” arXiv preprint arXiv:2403.06963, 2024.
- L. Li, H. Wang, W. Zhang, and A. Coster, “Stg-mamba: Spatial-temporal graph learning via selective state space model,” arXiv preprint arXiv:2403.12418, 2024.
- M. M. Islam and G. Bertasius, “Long movie clip classification with state-space video models,” in Proceedings of European Conference on Computer Vision, 2022, pp. 87–104.
- J. T. Smith, A. Warrington, and S. Linderman, “Simplified state space layers for sequence modeling,” in Proceedings of the International Conference on Learning Representations, 2022.
- J. Wang, W. Zhu, P. Wang, X. Yu, L. Liu, M. Omar, and R. Hamid, “Selective structured state-spaces for long-form video understanding,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 6387–6397.
- D. Hafner, J. Pasukonis, J. Ba, and T. Lillicrap, “Mastering diverse domains through world models,” arXiv preprint arXiv:2301.04104, 2023.
- Y. Liu, Y. Tian, Y. Zhao, H. Yu, L. Xie, Y. Wang, Q. Ye, and Y. Liu, “Vmamba: Visual state space model,” arXiv preprint arXiv:2401.10166, 2024.
- L. Zhu, B. Liao, Q. Zhang, X. Wang, W. Liu, and X. Wang, “Vision mamba: Efficient visual representation learning with bidirectional state space model,” arXiv preprint arXiv:2401.09417, 2024.
- Z. Xing, T. Ye, Y. Yang, G. Liu, and L. Zhu, “Segmamba: Long-range sequential modeling mamba for 3d medical image segmentation,” arXiv preprint arXiv:2401.13560, 2024.
- J. Ma, F. Li, and B. Wang, “U-mamba: Enhancing long-range dependency for biomedical image segmentation,” arXiv preprint arXiv:2401.04722, 2024.
- J. Liu, H. Yang, H.-Y. Zhou, Y. Xi, L. Yu, Y. Yu, Y. Liang, G. Shi, S. Zhang, H. Zheng et al., “Swin-umamba: Mamba-based unet with imagenet-based pretraining,” arXiv preprint arXiv:2402.03302, 2024.
- J. Ruan and S. Xiang, “Vm-unet: Vision mamba unet for medical image segmentation,” arXiv preprint arXiv:2402.02491, 2024.
- H. Gong, L. Kang, Y. Wang, X. Wan, and H. Li, “nnmamba: 3d biomedical image segmentation, classification and landmark detection with state space model,” arXiv preprint arXiv:2402.03526, 2024.
- Z. Wang, J.-Q. Zheng, Y. Zhang, G. Cui, and L. Li, “Mamba-unet: Unet-like pure visual mamba for medical image segmentation,” arXiv preprint arXiv:2402.05079, 2024.
- S. Li, H. Singh, and A. Grover, “Mamba-nd: Selective state space modeling for multi-dimensional data,” arXiv preprint arXiv:2402.05892, 2024.
- Z. Zheng and J. Zhang, “Fd-vision mamba for endoscopic exposure correction,” arXiv preprint arXiv:2402.06378, 2024.
- Z. Wang and C. Ma, “Semi-mamba-unet: Pixel-level contrastive cross-supervised visual mamba-based unet for semi-supervised medical image segmentation,” arXiv preprint arXiv:2402.07245, 2024.
- Z. Ye and T. Chen, “P-mamba: Marrying perona malik diffusion with mamba for efficient pediatric echocardiographic left ventricular segmentation,” arXiv preprint arXiv:2402.08506, 2024.
- Z. Wang and C. Ma, “Weak-mamba-unet: Visual mamba makes cnn and vit work better for scribble-based medical image segmentation,” arXiv preprint arXiv:2402.10887, 2024.
- X. He, K. Cao, K. Yan, R. Li, C. Xie, J. Zhang, and M. Zhou, “Pan-mamba: Effective pan-sharpening with state space model,” arXiv preprint arXiv:2402.12192, 2024.
- N. Agarwal, D. Suo, X. Chen, and E. Hazan, “Spectral state space models,” arXiv preprint arXiv:2312.06837, 2023.
- P. Mattes, R. Schlosser, and R. Herbrich, “Hieros: Hierarchical imagination on structured state space sequence world models,” arXiv preprint arXiv:2310.05167, 2023.
- E. Baron, I. Zimerman, and L. Wolf, “A 2-dimensional state space layer for spatial inductive bias,” in Proceedings of the International Conference on Learning Representations, 2023.
- H. Guo, J. Li, T. Dai, Z. Ouyang, X. Ren, and S.-T. Xia, “Mambair: A simple baseline for image restoration with state-space model,” arXiv preprint arXiv:2402.15648, 2024.
- J. Huang, L. Yang, F. Wang, Y. Wu, Y. Nan, A. I. Aviles-Rivero, C.-B. Schönlieb, D. Zhang, and G. Yang, “Mambamir: An arbitrary-masked mamba for joint medical image reconstruction and uncertainty estimation,” arXiv preprint arXiv:2402.18451, 2024.
- C.-S. Chen, G.-Y. Chen, D. Zhou, D. Jiang, and D.-S. Chen, “Res-vmamba: Fine-grained food category visual classification using selective state space models with deep residual learning,” arXiv preprint arXiv:2402.15761, 2024.
- T. Chen, Z. Tan, T. Gong, Q. Chu, Y. Wu, B. Liu, J. Ye, and N. Yu, “Mim-istd: Mamba-in-mamba for efficient infrared small target detection,” arXiv preprint arXiv:2403.02148, 2024.
- Y. Yue and Z. Li, “Medmamba: Vision mamba for medical image classification,” arXiv preprint arXiv:2403.03849, 2024.
- H. Tang, L. Cheng, G. Huang, Z. Tan, J. Lu, and K. Wu, “Rotate to scan: Unet-like mamba with triplet ssm module for medical image segmentation,” arXiv preprint arXiv:2403.17701, 2024.
- Z. Fang, Y. Wang, Z. Wang, J. Zhang, X. Ji, and Y. Zhang, “Mammil: Multiple instance learning for whole slide images with state space models,” arXiv preprint arXiv:2403.05160, 2024.
- K. Li, X. Li, Y. Wang, Y. He, Y. Wang, L. Wang, and Y. Qiao, “Videomamba: State space model for efficient video understanding,” arXiv preprint arXiv:2403.06977, 2024.
- J. Wang, J. Chen, D. Chen, and J. Wu, “Large window-based mamba unet for medical image segmentation: Beyond convolution and self-attention,” arXiv preprint arXiv:2403.07332, 2024.
- C. Cheng, H. Wang, and H. Sun, “Activating wider areas in image super-resolution,” arXiv preprint arXiv:2403.08330, 2024.
- Y. Schiff, C.-H. Kao, A. Gokaslan, T. Dao, A. Gu, and V. Kuleshov, “Caduceus: Bi-directional equivariant long-range dna sequence modeling,” arXiv preprint arXiv:2403.03234, 2024.
- Y. Zhang, W. Yan, K. Yan, C. P. Lam, Y. Qiu, P. Zheng, R. S.-Y. Tang, and S. S. Cheng, “Motion-guided dual-camera tracker for low-cost skill evaluation of gastric endoscopy,” arXiv preprint arXiv:2403.05146, 2024.
- W. Liao, Y. Zhu, X. Wang, C. Pan, Y. Wang, and L. Ma, “Lightm-unet: Mamba assists in lightweight unet for medical image segmentation,” arXiv preprint arXiv:2403.05246, 2024.
- G. Chen, Y. Huang, J. Xu, B. Pei, Z. Chen, Z. Li, J. Wang, K. Li, T. Lu, and L. Wang, “Video mamba suite: State space model as a versatile alternative for video understanding,” arXiv preprint arXiv:2403.09626, 2024.
- M. Zhang, Y. Yu, L. Gu, T. Lin, and X. Tao, “Vm-unet-v2 rethinking vision mamba unet for medical image segmentation,” arXiv preprint arXiv:2403.09157, 2024.
- T. Huang, X. Pei, S. You, F. Wang, C. Qian, and C. Xu, “Localmamba: Visual state space model with windowed selective scan,” arXiv preprint arXiv:2403.09338, 2024.
- Z. Xu, Y. Lin, H. Han, S. Yang, R. Li, Y. Zhang, and X. Li, “Mambatalk: Efficient holistic gesture synthesis with selective state space models,” arXiv preprint arXiv:2403.09471, 2024.
- X. Pei, T. Huang, and C. Xu, “Efficientvmamba: Atrous selective scan for light weight visual mamba,” arXiv preprint arXiv:2403.09977, 2024.
- C. Du, Y. Li, and C. Xu, “Understanding robustness of visual state space models for image classification,” arXiv preprint arXiv:2403.10935, 2024.
- Y. Shi, B. Xia, X. Jin, X. Wang, T. Zhao, X. Xia, X. Xiao, and W. Yang, “Vmambair: Visual state space model for image restoration,” arXiv preprint arXiv:2403.11423, 2024.
- T. Guo, Y. Wang, and C. Meng, “Mambamorph: a mamba-based backbone with contrastive feature learning for deformable mr-ct registration,” arXiv preprint arXiv:2401.13934, 2024.
- Y. Yang, Z. Xing, and L. Zhu, “Vivim: a video vision mamba for medical video object segmentation,” arXiv preprint arXiv:2401.14168, 2024.
- J. Xie, R. Liao, Z. Zhang, S. Yi, Y. Zhu, and G. Luo, “Promamba: Prompt-mamba for polyp segmentation,” arXiv preprint arXiv:2403.13660, 2024.
- R. Wu, Y. Liu, P. Liang, and Q. Chang, “H-vmunet: High-order vision mamba unet for medical image segmentation,” arXiv preprint arXiv:2403.13642, 2024.
- C. Yang, Z. Chen, M. Espinosa, L. Ericsson, Z. Wang, J. Liu, and E. J. Crowley, “Plainmamba: Improving non-hierarchical mamba in visual recognition,” arXiv preprint arXiv:2403.17695, 2024.
- K. S. Sanjid, M. T. Hossain, M. S. S. Junayed, and D. M. M. Uddin, “Integrating mamba sequence model and hierarchical upsampling network for accurate semantic segmentation of multiple sclerosis legion,” arXiv preprint arXiv:2403.17432, 2024.
- Y. Tang, P. Dong, Z. Tang, X. Chu, and J. Liang, “Vmrnn: Integrating vision mamba and lstm for efficient and accurate spatiotemporal forecasting,” arXiv preprint arXiv:2403.16536, 2024.
- Q. Shen, X. Yi, Z. Wu, P. Zhou, H. Zhang, S. Yan, and X. Wang, “Gamba: Marry gaussian splatting with mamba for single view 3d reconstruction,” arXiv preprint arXiv:2403.18795, 2024.
- Z. Wang, J.-Q. Zheng, C. Ma, and T. Guo, “Vmambamorph: a visual mamba-based framework with cross-scan module for deformable 3d image registration,” arXiv preprint arXiv:2404.05105, 2024.
- J. Hao, L. He, and K. F. Hung, “T-mamba: Frequency-enhanced gated long-range dependency for tooth 3d cbct segmentation,” arXiv preprint arXiv:2404.01065, 2024.
- W. Li, X. Hong, and X. Fan, “Spikemba: Multi-modal spiking saliency mamba for temporal video grounding,” arXiv preprint arXiv:2404.01174, 2024.
- X. Ma, X. Zhang, and M.-O. Pun, “Rs3mamba: Visual state space model for remote sensing images semantic segmentation,” arXiv preprint arXiv:2404.02457, 2024.
- H. Chen, J. Song, C. Han, J. Xia, and N. Yokoya, “Changemamba: Remote sensing change detection with spatio-temporal state space model,” arXiv preprint arXiv:2404.03425, 2024.
- M. Shahab Sepehri, Z. Fabian, and M. Soltanolkotabi, “Serpent: Scalable and efficient image restoration via multi-scale structured state space models,” arXiv preprint arXiv:2403.17902, 2024.
- Y. Yang, C. Ma, J. Yao, Z. Zhong, Y. Zhang, and Y. Wang, “Remamber: Referring image segmentation with mamba twister,” arXiv preprint arXiv:2403.17839, 2024.
- Q. Wang, C. Wang, Z. Lai, and Y. Zhou, “Insectmamba: Insect pest classification with state space model,” arXiv preprint arXiv:2404.03611, 2024.
- Q. Zhu, Y. Cai, Y. Fang, Y. Yang, C. Chen, L. Fan, and A. Nguyen, “Samba: Semantic segmentation of remotely sensed images with state space model,” arXiv preprint arXiv:2404.01705, 2024.
- A. Behrouz, M. Santacatterina, and R. Zabih, “Mambamixer: Efficient selective state space models with dual token and channel selection,” arXiv preprint arXiv:2403.19888, 2024.
- R. Wu, Y. Liu, P. Liang, and Q. Chang, “Ultralight vm-unet: Parallel vision mamba significantly reduces parameters for skin lesion segmentation,” arXiv preprint arXiv:2403.20035, 2024.
- B. Zou, Z. Guo, X. Hu, and H. Ma, “Rhythmmamba: Fast remote physiological measurement with arbitrary length videos,” arXiv preprint arXiv:2404.06483, 2024.
- H. He, Y. Bai, J. Zhang, Q. He, H. Chen, Z. Gan, C. Wang, X. Li, G. Tian, and L. Xie, “Mambaad: Exploring state space models for multi-class unsupervised anomaly detection,” arXiv preprint arXiv:2404.06564, 2024.
- S. Chaudhuri and S. Bhattacharya, “Simba: Mamba augmented u-shiftgcn for skeletal action recognition in videos,” arXiv preprint arXiv:2404.07645, 2024.
- A. Archit and C. Pape, “Vim-unet: Vision mamba for biomedical segmentation,” arXiv preprint arXiv:2404.07705, 2024.
- S. Long, Q. Zhou, X. Li, X. Lu, C. Ying, Y. Luo, L. Ma, and S. Yan, “Dgmamba: Domain generalization via generalized state space model,” arXiv preprint arXiv:2404.07794, 2024.
- S. Peng, X. Zhu, H. Deng, Z. Lei, and L.-J. Deng, “Fusionmamba: Efficient image fusion with state space model,” arXiv preprint arXiv:2404.07932, 2024.
- A. Gu, A. Gupta, K. Goel, and C. Ré, “On the parameterization and initialization of diagonal state space models,” arXiv preprint arXiv:2206.11893, 2022.
- F. Bonassi, C. Andersson, P. Mattsson, and T. B. Schön, “Structured state-space models are deep wiener models,” arXiv preprint arXiv:2312.06211, 2023.
- N. M. Cirone, A. Orvieto, B. Walker, C. Salvi, and T. Lyons, “Theoretical foundations of deep selective state-space models,” arXiv preprint arXiv:2402.19047, 2024.
- B. Peng, E. Alcaide, Q. Anthony, A. Albalak, S. Arcadinho, S. Biderman, H. Cao, X. Cheng, M. Chung, L. Derczynski et al., “Rwkv: Reinventing rnns for the transformer era,” in Findings of the Association for Computational Linguistics: EMNLP 2023, 2023, pp. 14 048–14 077.
- Y. Duan, W. Wang, Z. Chen, X. Zhu, L. Lu, T. Lu, Y. Qiao, H. Li, J. Dai, and W. Wang, “Vision-rwkv: Efficient and scalable visual perception with rwkv-like architectures,” arXiv preprint arXiv:2403.02308, 2024.
- Y. Sun, L. Dong, S. Huang, S. Ma, Y. Xia, J. Xue, J. Wang, and F. Wei, “Retentive network: A successor to transformer for large language models,” arXiv preprint arXiv:2307.08621, 2023.
- X. Ma, C. Zhou, X. Kong, J. He, L. Gui, G. Neubig, J. May, and L. Zettlemoyer, “Mega: Moving average equipped gated attention,” in The Eleventh International Conference on Learning Representations, 2022.
- D. Y. Fu, T. Dao, K. K. Saab, A. W. Thomas, A. Rudra, and C. Re, “Hungry hungry hippos: Towards language modeling with state space models,” in The Eleventh International Conference on Learning Representations, 2022.
- S. Zhai, W. Talbott, N. Srivastava, C. Huang, H. Goh, R. Zhang, and J. Susskind, “An attention free transformer,” arXiv preprint arXiv:2105.14103, 2021.
- H. Hou and F. R. Yu, “Rwkv-ts: Beyond traditional recurrent neural network for time series tasks,” arXiv preprint arXiv:2401.09093, 2024.
- Z. Zhu, W. Shao, and D. Jiao, “Tls-rwkv: Real-time online action detection with temporal label smoothing,” Neural Processing Letters, vol. 56, no. 2, pp. 1–13, 2024.
- Z. Fei, M. Fan, C. Yu, D. Li, and J. Huang, “Diffusion-rwkv: Scaling rwkv-like architectures for diffusion models,” arXiv preprint arXiv:2404.04478, 2024.
- C. Subakan, M. Ravanelli, S. Cornell, M. Bronzi, and J. Zhong, “Attention is all you need in speech separation,” in Proceedings of International Conference on Acoustics, Speech and Signal Processing, 2020, pp. 21–25.
- Z.-Q. Wang, S. Cornell, S. Choi, Y. Lee, B. Kim, and S. Watanabe, “Tf-gridnet: Making time-frequency domain models great again for monaural speaker separation,” in Proceedings of International Conference on Acoustics, Speech and Signal Processing, 2022, pp. 1–5.
- O. Lieber, B. Lenz, H. Bata, G. Cohen, J. Osin, I. Dalmedigos, E. Safahi, S. Meirom, Y. Belinkov, S. Shalev-Shwartz et al., “Jamba: A hybrid transformer-mamba language model,” arXiv preprint arXiv:2403.19887, 2024.
- S. Li, T. Zhu, F. Duan, L. Chen, H. Ning, and Y. Wan, “Harmamba: Efficient wearable sensor human activity recognition based on bidirectional selective ssm,” arXiv preprint arXiv:2403.20183, 2024.
- J. X. Yang, J. Zhou, J. Wang, H. Tian, and A. W. C. Liew, “Hsimamba: Hyperpsectral imaging efficient feature learning with bidirectional state space for classification,” arXiv preprint arXiv:2404.00272, 2024.
- S. Zhao, H. Chen, X. Zhang, P. Xiao, L. Bai, and W. Ouyang, “Rs-mamba for large remote sensing image dense prediction,” arXiv preprint arXiv:2404.02668, 2024.
- Y. Ding, A. Orvieto, B. He, and T. Hofmann, “Recurrent distance filtering for graph representation learning,” arXiv preprint arXiv:2312.01538, 2024.
- J. Park, J. Park, Z. Xiong, N. Lee, J. Cho, S. Oymak, K. Lee, and D. Papailiopoulos, “Can mamba learn how to learn? a comparative study on in-context learning tasks,” arXiv preprint arXiv:2402.04248, 2024.
- N. Zucchet, S. Kobayashi, Y. Akram, J. von Oswald, M. Larcher, A. Steger, and J. Sacramento, “Gated recurrent neural networks discover attention,” arXiv preprint arXiv:2309.01775, 2023.
- A. Ali, I. Zimerman, and L. Wolf, “The hidden attention of mamba models,” arXiv preprint arXiv:2403.01590, 2024.
- S. Yang, Y. Wang, and H. Chen, “Mambamil: Enhancing long sequence modeling with sequence reordering in computational pathology,” arXiv preprint arXiv:2403.06800, 2024.
- G. L. C. S. Z. Z. S. M. W. Q. Qiao Yanyuan, Yu Zheng and L. Jing, “Vl-mamba: Exploring state space models for multimodal learning,” arXiv preprint arXiv:2403.13600, 2024.
- G. Yang, K. Du, Z. Yang, Y. Du, Y. Zheng, and S. Wang, “Cmvim: Contrastive masked vim autoencoder for 3d multi-modal representation learning for ad classification,” arXiv preprint arXiv:2403.16520, 2024.
- H. Zhao, M. Zhang, W. Zhao, P. Ding, S. Huang, and D. Wang, “Cobra: Extending mamba to multi-modal large language model for efficient inference,” arXiv preprint arXiv:2403.14520, 2024.
- T. Ota, “Decision mamba: Reinforcement learning via sequence modeling with selective state spaces,” arXiv preprint arXiv:2403.19925, 2024.
- Z. Wan, Y. Wang, S. Yong, P. Zhang, S. Stepputtis, K. Sycara, and Y. Xie, “Sigma: Siamese mamba network for multi-modal semantic segmentation,” arXiv preprint arXiv:2404.04256, 2024.
- K. He, X. Chen, S. Xie, Y. Li, P. Dollár, and R. Girshick, “Masked autoencoders are scalable vision learners,” arXiv preprint arXiv:2111.06377, 2021.
- A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, “An image is worth 16x16 words: Transformers for image recognition at scale,” arXiv preprint arXiv:2010.11929, 2021.
- L. Chen, K. Lu, A. Rajeswaran, K. Lee, A. Grover, M. Laskin, P. Abbeel, A. Srinivas, and I. Mordatch, “Decision transformer: Reinforcement learning via sequence modeling,” in Proceedings of Advances in Neural Information Processing Systems, 2021, pp. 15 084–15 097.
- D. Liang, X. Zhou, X. Wang, X. Zhu, W. Xu, Z. Zou, X. Ye, and X. Bai, “Pointmamba: A simple state space model for point cloud analysis,” arXiv preprint arXiv:2402.10739, 2024.
- T. Zhang, X. Li, H. Yuan, S. Ji, and S. Yan, “Point cloud mamba: Point cloud learning via state space model,” arXiv preprint arXiv:2403.00762, 2024.
- J. Liu, R. Yu, Y. Wang, Y. Zheng, T. Deng, W. Ye, and H. Wang, “Point mamba: A novel point cloud backbone based on state space model with octree-based ordering strategy,” arXiv preprint arXiv:2403.06467, 2024.
- Q. Zhou, W. Yang, B. Fei, J. Xu, R. Zhang, K. Liu, Y. Luo, and Y. He, “3dmambaipf: A state space model for iterative point cloud filtering via differentiable rendering,” arXiv preprint arXiv:2404.05522, 2024.
- Y. Li, W. Yang, and B. Fei, “3dmambacomplete: Exploring structured state space model for point cloud completion,” arXiv preprint arXiv:2404.07106, 2024.
- N. Zubi’c, M. Gehrig, and D. Scaramuzza, “State space models for event cameras,” arXiv preprint arXiv:2402.15584, 2024.
- K. Goel, A. Gu, C. Donahue, and C. Ré, “It’s raw! audio generation with state-space models,” in Proceedings of the International Conference on Machine Learning, 2022, pp. 7616–7633.
- J. Wang, J. N. Yan, A. Gu, and A. M. Rush, “Pretraining without attention,” arXiv preprint arXiv:2212.10544, 2022.
- S. Massaroli, M. Poli, D. Fu, H. Kumbong, R. Parnichkun, D. Romero, A. Timalsina, Q. McIntyre, B. Chen, A. Rudra et al., “Laughing hyena distillery: Extracting compact recurrences from convolutions,” in Proceedings of Advances in Neural Information Processing Systems, 2023.
- J. Smith, S. De Mello, J. Kautz, S. Linderman, and W. Byeon, “Convolutional state space models for long-range spatiotemporal modeling,” in Proceedings of Advances in Neural Information Processing Systems, 2023.
- C. Lu, Y. Schroecker, A. Gu, E. Parisotto, J. Foerster, S. Singh, and F. Behbahani, “Structured state space models for in-context reinforcement learning,” in Proceedings of Advances in Neural Information Processing Systems, 2023.
- S. Wang and Q. Li, “Stablessm: Alleviating the curse of memory in state-space models through stable reparameterization,” arXiv preprint arXiv:2311.14495, 2023.
- M. Pióro, K. Ciebiera, K. Król, J. Ludziejewski, and S. Jaszczur, “Moe-mamba: Efficient selective state space models with mixture of experts,” arXiv preprint arXiv:2401.04081, 2024.
- J. Wang, T. Gangavarapu, J. N. Yan, and A. M. Rush, “Mambabyte: Token-free selective state space model,” arXiv preprint arXiv:2403.13660, 2024.
- Q. Anthony, Y. Tokpanov, P. Glorioso, and B. Millidge, “Blackmamba: Mixture of experts for state-space models,” arXiv preprint arXiv:2402.01771, 2024.
- F. L. Bronnec, S. Duong, M. Ravaut, A. Allauzen, N. F. Chen, V. Guigue, A. Lumbreras, L. Soulier, and P. Gallinari, “Locost: State-space models for long document abstractive summarization,” arXiv preprint arXiv:2401.17919, 2024.
- M. R. Samsami, A. Zholus, J. Rajendran, and S. Chandar, “Mastering memory tasks with world models,” arXiv preprint arXiv:2403.04253, 2024.
- T. Katsch, “Gateloop: Fully data-controlled linear recurrence for sequence modeling,” arXiv preprint arXiv:2311.01927, 2023.
- F. Liu and Q. Li, “From generalization analysis to optimization designs for state space models,” 2024. [Online]. Available: https://openreview.net/forum?id=EGjvMcKrrl
- A. Yu, A. Nigmetov, D. Morozov, M. W. Mahoney, and N. B. Erichson, “Robustifying state-space models for long sequences via approximate diagonalization,” in Proceedings of the International Conference on Learning Representations, 2024.
- E. David, J. Bellot, and S. L. Corff, “Variational quantization for state space models,” 2024. [Online]. Available: https://openreview.net/forum?id=EAkjVCtRO2
- D. Y. Fu, H. Kumbong, E. Nguyen, and C. Ré, “Flashfftconv: Efficient convolutions for long sequences with tensor cores,” arXiv preprint arXiv:2311.05908, 2023.
- C. Liu, J. Lin, J. Wang, H. Liu, and J. Caverlee, “Mamba4rec: Towards efficient sequential recommendation with selective state space models,” arXiv preprint arXiv:2403.03900, 2024.
- B. Silva, M. Contreras, S. Bandyopadhyay, Y. Ren, Z. Guan, J. Balch, K. Khezeli, T. O. Baslanti, B. Shickel, A. Bihorac et al., “A multi-cohort study on prediction of acute brain dysfunction states using selective state space models,” arXiv preprint arXiv:2403.07201, 2024.
- C. Quan and X. Li, “Multichannel long-term streaming neural speech enhancement for static and moving speakers,” arXiv preprint arXiv:2403.07675, 2024.
- Z. Shi, “Mambastock: Selective state space model for stock prediction,” arXiv preprint arXiv:2402.18959, 2024.
- R. Bhirangi, C. Wang, V. Pattabiraman, C. Majidi, A. Gupta, T. Hellebrekers, and L. Pinto, “Hierarchical state space models for continuous sequence-to-sequence modeling,” arXiv preprint arXiv:2402.10211, 2024.
- M. A. Ahamed and Q. Cheng, “Timemachine: A time series is worth 4 mambas for long-term forecasting,” arXiv preprint arXiv:2403.09898, 2024.
- Y. Zhang, Z. Lin, Y. Sun, F. Yin, and C. Fritsche, “Regularization-based efficient continual learning in deep state-space models,” arXiv preprint arXiv:2403.10123, 2024.
- M. Poli, A. W. Thomas, E. Nguyen, P. Ponnusamy, B. Deiseroth, K. Kersting, T. Suzuki, B. Hie, S. Ermon, C. R’e, C. Zhang, and S. Massaroli, “Mechanistic design and scaling of hybrid architectures,” arXiv preprint arXiv:2403.17844, 2024.
- D. LaRocque, W. Guimont-Martin, D.-A. Duclos, P. Giguère, and F. Pomerleau, “Proprioception is all you need: Terrain classification for boreal forests,” arXiv preprint arXiv:2403.16877, 2024.
- Z. Wang, F. Kong, S. Feng, M. Wang, H. Zhao, D. Wang, and Y. Zhang, “Is mamba effective for time series forecasting?” arXiv preprint arXiv:2403.11144, 2024.
- B. N. Patro and V. S. Agneeswaran, “Simba: Simplified mamba-based architecture for vision and multivariate time series,” arXiv preprint arXiv:2403.15360, 2024.
- Z. Xu, “Rankmamba, benchmarking mamba’s document ranking performance in the era of transformers,” arXiv preprint arXiv:2403.18276, 2024.
- A. S. Sharma, D. Atkinson, and D. Bau, “Locating and editing factual associations in mamba,” 2024.
- H. Yin, G. Cheng, C. J. Steinmetz, R. Yuan, R. M. Stern, and R. Dannenberg, “Modeling analog dynamic range compressors using deep learning and state-space models,” arXiv preprint arXiv:2403.16331, 2024.
- M. Forgione, M. Mejari, and D. Piga, “Model order reduction of deep structured state-space models: A system-theoretic approach,” arXiv preprint arXiv:2403.14833, 2024.
- J. Yang, Y. Li, J. Zhao, H. Wang, M. Ma, J. Ma, Z. Ren, M. Zhang, X. Xin, Z. Chen et al., “Uncovering selective state space model’s capabilities in lifelong sequential recommendation,” arXiv preprint arXiv:2403.16371, 2024.
- S. Wang and B. Xue, “State-space models with layer-wise nonlinearity are universal approximators with exponential decaying memory,” in Proceedings of Advances in Neural Information Processing Systems, 2023.
- I. Amos, J. Berant, and A. Gupta, “Never train from scratch: Fair comparison of long-sequence models requires data-driven priors,” arXiv preprint arXiv:2310.02980, 2023.
- C. A. Alonso, J. Sieber, and M. N. Zeilinger, “State space models as foundation models: A control theoretic overview,” arXiv preprint arXiv:2403.16899, 2024.
- E. J. Olucha, B. Terzin, A. Das, and R. Tóth, “On the reduction of linear parameter-varying state-space models,” arXiv preprint arXiv:2404.01871, 2024.
- Z. Tan, Y. Yang, J. Wan, G. Guo, and S. Z. Li, “Relation-aware pedestrian attribute recognition with graph convolutional networks,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 7, pp. 12 055–12 062, 2020.
- J. Wu, H. Liu, J. Jiang, M. Qi, B. Ren, X. Li, and Y. Wang, “Person attribute recognition by sequence contextual relation learning,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 30, no. 10, pp. 3398–3412, 2020.
- J. Jia, X. Chen, and K. Huang, “Spatial and semantic consistency regularizations for pedestrian attribute recognition,” Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 942–951, 2021.
- “Inter-attribute awareness for pedestrian attribute recognition,” Pattern Recognition, vol. 131, p. 108865, 2022.
- L. Chen, J. Song, X. Zhang, and M. Shang, “Mcfl: multi-label contrastive focal loss for deep imbalanced pedestrian attribute recognition,” Neural Computing and Applications, vol. 34, no. 19, pp. 16 701–16 715, 2022.
- Z. Tang and J. Huang, “Drformer: Learning dual relations using transformer for pedestrian attribute recognition,” Neurocomputing, vol. 497, pp. 159–169, 2022.
- H. Guo, X. Fan, and S. Wang, “Visual attention consistency for human attribute recognition,” International Journal of Computer Vision, vol. 130, no. 4, pp. 1088–1106, 2022.
- J. Jia, N. Gao, F. He, X. Chen, and K. Huang, “Learning disentangled attribute representations for robust pedestrian attribute recognition,” in Proceedings of the AAAI Conference on Artificial Intelligence, 2022, pp. 1069–1077.
- H. Fan, H.-M. Hu, S. Liu, W. Lu, and S. Pu, “Correlation graph convolutional network for pedestrian attribute recognition,” IEEE Transactions on Multimedia, vol. 24, pp. 49–60, 2020.
- Y. Yang, Z. Tan, P. Tiwari, H. M. Pandey, J. Wan, Z. Lei, G. Guo, and S. Z. Li, “Cascaded split-and-aggregate learning with feature recombination for pedestrian attribute recognition,” International Journal of Computer Vision, vol. 129, no. 10, pp. 2731–2744, 2021.
- X. Wang, J. Jin, C. Li, J. Tang, C. Zhang, and W. Wang, “Pedestrian attribute recognition via clip based prompt vision-language fusion,” arXiv preprint arXiv:2312.10692, 2023.
- J. Jin, X. Wang, C. Li, L. Huang, and J. Tang, “Sequencepar: Understanding pedestrian attributes via a sequence generation paradigm,” arXiv preprint arXiv:2312.01640, 2023.
- X. Cheng, M. Jia, Q. Wang, and J. Zhang, “A simple visual-textual baseline for pedestrian attribute recognition,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 32, no. 10, pp. 6994–7004, 2022.
- X. Liu, H. Zhao, M. Tian, L. Sheng, J. Shao, S. Yi, J. Yan, and X. Wang, “Hydraplus-net: Attentive deep features for pedestrian analysis,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2017, pp. 350–359.
- Y. Deng, P. Luo, C. C. Loy, and X. Tang, “Pedestrian attribute recognition at far distance,” in Proceedings of the ACM International Conference on Multimedia, 2014, pp. 789–792.
- B. Ye, H. Chang, B. Ma, S. Shan, and X. Chen, “Joint feature learning and relation modeling for tracking: A one-stream framework,” in Proceedings of European Conference on Computer Vision, 2022, pp. 341–357.
- N. Wang, W. Zhou, J. Wang, and H. Li, “Transformer meets tracker: Exploiting temporal context for robust visual tracking,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, p. 1571–1580.
- C. Mayer, M. Danelljan, G. Bhat, M. Paul, D. P. Paudel, F. Yu, and L. V. Gool, “Transforming model prediction for tracking,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, p. 8731–8740.
- G. Bhat, M. Danelljan, L. V. Gool, and R. Timofte, “Learning discriminative model prediction for tracking,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, p. 6182–6191.
- M. Danelljan, L. V. Gool, and R. Timofte, “Probabilistic regression for visual tracking,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, p. 7183–7192.
- G. Bhat, M. Danelljan, L. Van Gool, and R. Timofte, “Know your surroundings: Exploiting scene information for object tracking,” in Proceedings of European Conference on Computer Vision, 2020, p. 205–221.
- M. Danelljan, G. Bhat, F. Shahbaz Khan, and M. Felsberg, “Atom: Accurate tracking by overlap maximization,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, p. 4660–4669.
- X. Wang, S. Wang, C. Tang, L. Zhu, B. Jiang, Y. Tian, and J. Tang, “Event stream-based visual object tracking: A high-resolution benchmark dataset and a novel baseline,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024.
- S. Gao, C. Zhou, C. Ma, X. Wang, and J. Yuan, “Aiatrack: Attention in attention for transformer visual tracking,” in Proceedings of European Conference on Computer Vision, 2022, p. 146–164.
- B. Ye, H. Chang, B. Ma, and S. Shan, “Joint feature learning and relation modeling for tracking: A one-stream framework,” arXiv preprint arXiv:2203.11991, 2022.
- X. Chen, J. Yan, Bin Zhu, D. Wang, X. Yang, and H. Lu, “Transformer tracking,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, p. 8126–8135.
- Y. Cui, C. Jiang, L. Wang, and W. Gangshan, “Mixformer: End-to-end tracking with iterative mixed attention,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, p. 13608–13618.
- B. Chen, P. Li, L. Bai, L. Qiao, Q. Shen, B. Li, W. Gan, W. Wu, and W. Ouyang, “Backbone is all your need: A simplified architecture for visual object tracking,” in Proceedings of European Conference on Computer Vision, 2021, p. 375–392.
- H. Cao, Y. Wang, J. Chen, D. Jiang, X. Zhang, Q. Tian, and M. Wang, “Swin-unet: Unet-like pure transformer for medical image segmentation,” in Proceedings of European Conference on Computer Vision, 2022, pp. 205–218.
- D. Demner-Fushman, M. D. Kohli, M. B. Rosenman, S. E. Shooshan, L. Rodriguez, S. Antani, G. R. Thoma, and C. J. McDonald, “Preparing a collection of radiology examinations for distribution and retrieval,” Journal of the American Medical Informatics Association, vol. 23, no. 2, pp. 304–310, 2016.
- H. Touvron, L. Martin, K. R. Stone, P. Albert, A. Almahairi, Y. Babaei, N. Bashlykov, S. Batra, P. Bhargava, S. Bhosale, D. M. Bikel, L. Blecher, C. C. Ferrer, M. Chen, G. Cucurull, D. Esiobu, J. Fernandes, J. Fu, W. Fu, B. Fuller, C. Gao, V. Goswami, N. Goyal, A. S. Hartshorn, S. Hosseini, R. Hou, H. Inan, M. Kardas, V. Kerkez, M. Khabsa, I. M. Kloumann, A. V. Korenev, P. S. Koura, M.-A. Lachaux, T. Lavril, J. Lee, D. Liskovich, Y. Lu, Y. Mao, X. Martinet, T. Mihaylov, P. Mishra, I. Molybog, Y. Nie, A. Poulton, J. Reizenstein, R. Rungta, K. Saladi, A. Schelten, R. Silva, E. M. Smith, R. Subramanian, X. Tan, B. Tang, R. Taylor, A. Williams, J. X. Kuan, P. Xu, Z. Yan, I. Zarov, Y. Zhang, A. Fan, M. Kambadur, S. Narang, A. Rodriguez, R. Stojnic, S. Edunov, and T. Scialom, “Llama 2: Open foundation and fine-tuned chat models,” arXiv preprint arXiv:2307.09288, 2023.
- Z. Chen, Y. Song, T.-H. Chang, and X. Wan, “Generating radiology reports via memory-driven transformer,” in Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2020, pp. 1439–1449.
- C. Y. Li, X. Liang, Z. Hu, and E. P. Xing, “Knowledge-driven encode, retrieve, paraphrase for medical image report generation,” in Proceedings of the AAAI Conference on Artificial Intelligence, 2019, pp. 6666–6673.
- Y. Li, X. Liang, Z. Hu, and E. P. Xing, “Hybrid retrieval-generation reinforced agent for medical image report generation,” in Proceedings of Advances in Neural Information Processing Systems, 2018.
- Y. Zhang, X. Wang, Z. Xu, Q. Yu, A. Yuille, and D. Xu, “When radiology report generation meets knowledge graph,” in Proceedings of the AAAI Conference on Artificial Intelligence, 2020, pp. 12 910–12 917.
- F. Liu, X. Wu, S. Ge, W. Fan, and Y. Zou, “Exploring and distilling posterior and prior knowledge for radiology report generation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 13 753–13 762.
- S. Yang, X. Wu, S. Ge, S. K. Zhou, and L. Xiao, “Knowledge matters: Chest radiology report generation with general and specific knowledge,” Medical Image Analysis, vol. 80, p. 102510, 2022.
- F. Liu, C. Yin, X. Wu, S. Ge, P. Zhang, and X. Sun, “Contrastive attention for automatic chest x-ray report generation,” in Proceedings of Findings of the Association for Computational Linguistics, 2021, pp. 269–280.
- F. Liu, S. Ge, and X. Wu, “Competence-based multimodal curriculum learning for medical report generation,” in Proceedings of the Annual Meeting of the Association for Computational Linguistics and the International Joint Conference on Natural Language Processing, 2021, pp. 3001–3012.
- M. Li, B. Lin, Z. Chen, H. Lin, X. Liang, and X. Chang, “Dynamic graph enhanced contrastive learning for chest x-ray report generation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 3334–3343.
- Z. Zhuang, L. Wei, L. Xie, T. Zhang, H. Zhang, H. Wu, H. Ai, and Q. Tian, “Rethinking the distribution gap of person re-identification with camera-based batch normalization,” in Proceedings of European Conference on Computer Vision. Springer, 2020, pp. 140–157.
- B. He, J. Li, Y. Zhao, and Y. Tian, “Part-regularized near-duplicate vehicle re-identification,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 3997–4005.
- K. Zhou, Y. Yang, A. Cavallaro, and T. Xiang, “Omni-scale feature learning for person re-identification,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 3702–3712.
- J. Qian, W. Jiang, H. Luo, and H. Yu, “Stripe-based and attribute-aware network: A two-branch deep model for vehicle re-identification,” Measurement Science and Technology, vol. 31, no. 9, p. 095401, 2020.
- G. Wang, Y. Yuan, X. Chen, J. Li, and X. Zhou, “Learning discriminative features with multiple granularities for person re-identification,” in Proceedings of the ACM International Conference on Multimedia, 2018, pp. 274–282.
- X. Jin, C. Lan, W. Zeng, and Z. Chen, “Uncertainty-aware multi-shot knowledge distillation for image-based object re-identification,” in Proceedings of the AAAI Conference on Artificial Intelligence, 2020, pp. 11 165–11 172.
- Z. Zhang, C. Lan, W. Zeng, X. Jin, and Z. Chen, “Relation-aware global attention for person re-identification,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 3186–3195.
- R. Chu, Y. Sun, Y. Li, Z. Liu, C. Zhang, and Y. Wei, “Vehicle re-identification with viewpoint-aware metric learning,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 8282–8291.
- X. Jin, C. Lan, W. Zeng, G. Wei, and Z. Chen, “Semantics-aligned representation learning for person re-identification,” in Proceedings of the AAAI Conference on Artificial Intelligence, 2020, pp. 11 173–11 180.
- T.-S. Chen, C.-T. Liu, C.-W. Wu, and S.-Y. Chien, “Orientation-aware vehicle re-identification with semantics-guided part attention network,” in Proceedings of European Conference on Computer Vision, 2020, pp. 330–346.
- X. Chen, C. Fu, Y. Zhao, F. Zheng, J. Song, R. Ji, and Y. Yang, “Salience-guided cascaded suppression network for person re-identification,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 3300–3310.
- X. Zhang, R. Zhang, J. Cao, D. Gong, M. You, and C. Shen, “Part-guided attention learning for vehicle re-identification,” arXiv preprint arXiv:1909.06023, 2019.
- T. Chen, S. Ding, J. Xie, Y. Yuan, W. Chen, Y. Yang, Z. Ren, and Z. Wang, “Abd-net: Attentive but diverse person re-identification,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 8351–8361.
- D. Meng, L. Li, X. Liu, Y. Li, S. Yang, Z.-J. Zha, X. Gao, S. Wang, and Q. Huang, “Parsing-based view-aware embedding network for vehicle re-identification,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 7103–7112.
- J. Miao, Y. Wu, P. Liu, Y. Ding, and Y. Yang, “Pose-guided feature alignment for occluded person re-identification,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 542–551.
- P. Khorramshahi, N. Peri, J.-c. Chen, and R. Chellappa, “The devil is in the details: Self-supervised attention for vehicle re-identification,” in Proceedings of European Conference on Computer Vision, 2020, pp. 369–386.
- G. Wang, S. Yang, H. Liu, Z. Wang, Y. Yang, S. Wang, G. Yu, E. Zhou, and J. Sun, “High-order information matters: Learning relation and topology for occluded person re-identification,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 6449–6458.
- Z. Sun, X. Nie, X. Xi, and Y. Yin, “Cfvmnet: A multi-branch network for vehicle re-identification based on common field of view,” in Proceedings of the ACM International Conference on Multimedia, 2020, pp. 3523–3531.
- K. Zhu, H. Guo, Z. Liu, M. Tang, and J. Wang, “Identity-guided human semantic parsing for person re-identification,” in Proceedings of European Conference on Computer Vision, 2020, pp. 346–363.
- A. Suprem and C. Pu, “Looking glamorous: Vehicle re-id in heterogeneous cameras networks with global and local attention,” arXiv preprint arXiv:2002.02256, 2020.
- S. He, H. Luo, P. Wang, F. Wang, H. Li, and W. Jiang, “Transreid: Transformer-based object re-identification,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 15 013–15 022.
- X. Wang, W. Wu, C. Li, Z. Zhao, Z. Chen, Y. Shi, and J. Tang, “Structural information guided multimodal pre-training for vehicle-centric perception,” in Proceedings of the AAAI Conference on Artificial Intelligence, 2024, pp. 5624–5632.
- X. Shu, X. Wang, X. Zang, S. Zhang, Y. Chen, G. Li, and Q. Tian, “Large-scale spatio-temporal person re-identification: Algorithms and benchmark,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 32, no. 7, pp. 4390–4403, 2021.
- L. Wei, S. Zhang, W. Gao, and Q. Tian, “Person transfer gan to bridge domain gap for person re-identification,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018, pp. 79–88.
- L. Zheng, L. Shen, L. Tian, S. Wang, J. Wang, and Q. Tian, “Scalable person re-identification: A benchmark,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2015, pp. 1116–1124.
- Z. Zhang, J. Wu, X. Zhang, and C. Zhang, “Multi-target, multi-camera tracking by hierarchical clustering: Recent progress on dukemtmc project,” arXiv preprint arXiv:1712.09531, 2017.
- X. Liu, W. Liu, H. Ma, and H. Fu, “Large-scale vehicle re-identification in urban surveillance videos,” in Proceedings of the IEEE International Conference on Multimedia and Expo, 2016, pp. 1–6.
- H. Liu, Y. Tian, Y. Yang, L. Pang, and T. Huang, “Deep relative distance learning: Tell the difference between similar vehicles,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2016, pp. 2167–2175.
- H. Luo, Y. Gu, X. Liao, S. Lai, and W. Jiang, “Bag of tricks and a strong baseline for deep person re-identification,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2019, pp. 1487–1495.
- H. Touvron, M. Cord, M. Douze, F. Massa, A. Sablayrolles, and H. Jégou, “Training data-efficient image transformers & distillation through attention,” in Proceedings of the International Conference on Machine Learning, 2021, pp. 10 347–10 357.
- A. Q. Jiang, A. Sablayrolles, A. Roux, A. Mensch, B. Savary, C. Bamford, D. S. Chaplot, D. d. l. Casas, E. B. Hanna, F. Bressand et al., “Mixtral of experts,” arXiv preprint arXiv:2401.04088, 2024.
- C. Tang, X. Wang, J. Huang, B. Jiang, L. Zhu, J. Zhang, Y. Wang, and Y. Tian, “Revisiting color-event based tracking: A unified network, dataset, and metric,” arXiv preprint arXiv:2211.11010, 2022.
- X. Wang, J. Huang, S. Wang, C. Tang, B. Jiang, Y. Tian, J. Tang, and B. Luo, “Long-term frame-event visual tracking: Benchmark dataset and baseline,” arXiv preprint arXiv:2403.05839, 2024.
- M. Lin, Q. Chen, and S. Yan, “Network in network,” arXiv preprint arXiv:1312.4400, 2013.
- Xiao Wang (507 papers)
- Shiao Wang (16 papers)
- Yuhe Ding (10 papers)
- Yuehang Li (7 papers)
- Wentao Wu (43 papers)
- Yao Rong (30 papers)
- Weizhe Kong (3 papers)
- Ju Huang (9 papers)
- Shihao Li (17 papers)
- Haoxiang Yang (13 papers)
- Ziwen Wang (37 papers)
- Bo Jiang (235 papers)
- Chenglong Li (94 papers)
- Yaowei Wang (149 papers)
- Yonghong Tian (184 papers)
- Jin Tang (139 papers)