A Novel State Space Model with Local Enhancement and State Sharing for Image Fusion (2404.09293v2)
Abstract: In image fusion tasks, images from different sources possess distinct characteristics. This has driven the development of numerous methods to explore better ways of fusing them while preserving their respective characteristics.Mamba, as a state space model, has emerged in the field of natural language processing. Recently, many studies have attempted to extend Mamba to vision tasks. However, due to the nature of images different from causal language sequences, the limited state capacity of Mamba weakens its ability to model image information. Additionally, the sequence modeling ability of Mamba is only capable of spatial information and cannot effectively capture the rich spectral information in images. Motivated by these challenges, we customize and improve the vision Mamba network designed for the image fusion task. Specifically, we propose the local-enhanced vision Mamba block, dubbed as LEVM. The LEVM block can improve local information perception of the network and simultaneously learn local and global spatial information. Furthermore, we propose the state sharing technique to enhance spatial details and integrate spatial and spectral information. Finally, the overall network is a multi-scale structure based on vision Mamba, called LE-Mamba. Extensive experiments show the proposed methods achieve state-of-the-art results on multispectral pansharpening and multispectral and hyperspectral image fusion datasets, and demonstrate the effectiveness of the proposed approach. Codes can be accessed at \url{https://github.com/294coder/Efficient-MIF}.
- Reversible Column Networks. In ICLR.
- Diffusion model with disentangled modulations for sharpening multispectral and hyperspectral images. Inf. Fus. 104 (2024), 102158.
- Simple baselines for image restoration. In ECCV. Springer, 17–33.
- Detail injection-based deep convolutional neural networks for pansharpening. IEEE Trans. Geosci. Remote Sens. 59, 8 (2020), 6995–7010.
- Machine Learning in Pansharpening: A benchmark, from shallow to deep networks. IEEE Geosci. Remote Sens. Mag. 10, 3 (2022), 279–315.
- PSRT: Pyramid shuffle-and-reshuffle transformer for multispectral and hyperspectral image fusion. IEEE Trans. Geosci. Remote Sens. 61 (2023), 1–15.
- Renwei Dian and Shutao Li. 2019. Hyperspectral Image Super-Resolution via Subspace-Based Low Tensor Multi-Rank Regularization. IEEE Trans. Image Process. 28, 10 (2019), 5135–5146.
- Learning a Low Tensor-Train Rank Representation for Hyperspectral Image Super-Resolution. IEEE Trans. Neural Netw. Learn. Syst. 30, 9 (2019), 2672–2683.
- Model-guided deep hyperspectral image super-resolution. IEEE Trans. Image Process. 30 (2021), 5754–5768.
- An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. In ICLR.
- MIMO-SST: Multi-Input Multi-Output Spatial-Spectral Transformer for Hyperspectral and Multispectral Image Fusion. IEEE Trans. Geosci. Remote Sens. 62 (2024), 1–20.
- Albert Gu and Tri Dao. 2023. Mamba: Linear-time sequence modeling with selective state spaces. arXiv preprint arXiv:2312.00752 (2023).
- Efficiently modeling long sequences with structured state spaces. arXiv preprint arXiv:2111.00396 (2021).
- Flatten transformer: Vision transformer using focused linear attention. In CVPR. 5961–5971.
- Pansharpening via detail injection based convolutional neural networks. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 12, 4 (2019), 1188–1204.
- Pan-Mamba: Effective pan-sharpening with State Space Model. arXiv preprint arXiv:2402.12192 (2024).
- Fusformer: A transformer-based fusion network for hyperspectral image super-resolution. IEEE Geoscience and Remote Sensing Letters 19 (2022), 1–5.
- Hyperspectral image super-resolution via deep spatiospectral attention convolutional neural networks. IEEE Trans. Neural Netw. Learn. Syst. 33, 12 (2021), 7251–7265.
- Deep hyperspectral image fusion network with iterative spatio-spectral regularization. IEEE Transactions on Computational Imaging 8 (2022), 201–214.
- LAGConv: Local-Context Adaptive Convolution Kernels with Global Harmonic Bias for Pansharpening. AAAI 36, 1 (Jun. 2022), 1113–1121.
- Fusing hyperspectral and multispectral images via coupled sparse tensor factorization. IEEE Transactions on Image Processing 27, 8 (2018), 4118–4130.
- Swin-umamba: Mamba-based unet with imagenet-based pretraining. arXiv preprint arXiv:2402.03302 (2024).
- Remote sensing image fusion based on two-stream fusion network. Inf. Fus. 55 (2020), 1–15.
- VMamba: Visual State Space Model. arXiv preprint arXiv:2401.10166 (2024).
- Swin transformer: Hierarchical vision transformer using shifted windows. In ICCV. 10012–10022.
- Haze correction for contrast-based multispectral pansharpening. IEEE Geosci. Remote Sens. Lett. 14, 12 (2017), 2255–2259.
- Ilya Loshchilov and Frank Hutter. 2017. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101 (2017).
- Structured State Space Models for In-Context Reinforcement Learning. ArXiv abs/2303.03982 (2023).
- U-mamba: Enhancing long-range dependency for biomedical image segmentation. arXiv preprint arXiv:2401.04722 (2024).
- Learning a 3D-CNN and transformer prior for hyperspectral image super-resolution. Inf. Fus. 100 (2023), 101907.
- PanDiff: A Novel Pansharpening Method Based on Denoising Diffusion Probabilistic Model. IEEE Trans. Geosci. Remote Sens. 61 (2023), 1–17.
- S4nd: Modeling images and videos as multidimensional signals with state spaces. Neurips 35 (2022), 2846–2861.
- U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention (MICCAI). Springer, 234–241.
- U-net: Convolutional networks for biomedical image segmentation. In MICCAI. Springer, 234–241.
- Jiacheng Ruan and Suncheng Xiang. 2024. Vm-unet: Vision mamba unet for medical image segmentation. arXiv preprint arXiv:2402.02491 (2024).
- Dual Spatial-spectral Pyramid Network with Transformer for Hyperspectral Image Fusion. IEEE Trans. Geosci. Remote Sens. (2023).
- Interpretable Model-Driven Deep Network for Hyperspectral, Multispectral, and Panchromatic Image Fusion. IEEE Trans. Neural Netw. Learn. Syst. (2023), 1–14.
- Gemine Vivone. 2023. Multispectral and hyperspectral image fusion in remote sensing: A survey. Information Fusion 89 (2023), 405–417.
- Full scale regression-based injection coefficients for panchromatic sharpening. IEEE Trans. Image Process. 27, 7 (2018), 3418–3431.
- Pyramid vision transformer: A versatile backbone for dense prediction without convolutions. In ICCV. 568–578.
- Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing 13, 4 (2004), 600–612.
- Dynamic cross feature fusion for remote sensing pansharpening. In ICCV. 14687–14696.
- LRTCFPan: Low-Rank Tensor Completion Based Framework for Pansharpening. IEEE Transactions on Image Processing 32 (2023), 1640–1655. https://doi.org/10.1109/TIP.2023.3247165
- An Iterative Regularization Method Based on Tensor Subspace Representation for Hyperspectral Image Super-Resolution. IEEE Trans. Geosci. Remote Sens. 60 (2022), 1–16.
- PanNet: A Deep Network Architecture for Pan-Sharpening. In ICCV. 1753–1761.
- Metaformer is actually what you need for vision. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 10819–10829.
- SSR-NET: Spatial–spectral reconstruction network for hyperspectral and multispectral image fusion. IEEE Trans. Geosci. Remote Sens. 59, 7 (2020), 5953–5965.
- Effective Pan-Sharpening With Transformer and Invertible Neural Network. IEEE Trans. Geosci. Remote Sens. 60 (2022), 1–15.
- Pan-sharpening with Customized Transformer and Invertible Neural Network. In AAAI.
- Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model. arXiv preprint arXiv:2401.09417 (2024).
- Xiao Wu (55 papers)
- Yu Zhong (27 papers)
- Liang-jian Deng (32 papers)
- ZiHan Cao (10 papers)