Shuffle Mamba: State Space Models with Random Shuffle for Multi-Modal Image Fusion
Abstract: Multi-modal image fusion integrates complementary information from different modalities to produce enhanced and informative images. Although State-Space Models, such as Mamba, are proficient in long-range modeling with linear complexity, most Mamba-based approaches use fixed scanning strategies, which can introduce biased prior information. To mitigate this issue, we propose a novel Bayesian-inspired scanning strategy called Random Shuffle, supplemented by an theoretically-feasible inverse shuffle to maintain information coordination invariance, aiming to eliminate biases associated with fixed sequence scanning. Based on this transformation pair, we customized the Shuffle Mamba Framework, penetrating modality-aware information representation and cross-modality information interaction across spatial and channel axes to ensure robust interaction and an unbiased global receptive field for multi-modal image fusion. Furthermore, we develop a testing methodology based on Monte-Carlo averaging to ensure the model's output aligns more closely with expected results. Extensive experiments across multiple multi-modal image fusion tasks demonstrate the effectiveness of our proposed method, yielding excellent fusion quality over state-of-the-art alternatives. Code will be available upon acceptance.
- Comparison of Pansharpening Algorithms: Outcome of the 2006 GRS-S Data Fusion Contest. IEEE Transactions on Geoscience and Remote Sensing, 45(10): 3012–3021.
- RSMamba: Remote Sensing Image Classification with State Space Model. arXiv preprint arXiv:2403.19654.
- A multiscale residual pyramid attention network for medical image fusion. Biomedical Signal Processing and Control, 66: 102488.
- Mamba: Linear-time sequence modeling with selective state spaces. arXiv preprint arXiv:2312.00752.
- Efficiently modeling long sequences with structured state spaces. arXiv preprint arXiv:2111.00396.
- Pan-Mamba: Effective pan-sharpening with State Space Model. arXiv preprint arXiv:2402.12192.
- Frequency-Adaptive Pan-Sharpening with Mixture of Experts. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 38, 2121–2129.
- Multi-Scale Dual-Domain Guidance Network for Pan-sharpening. IEEE Transactions on Geoscience and Remote Sensing.
- Localmamba: Visual state space model with windowed selective scan. arXiv preprint arXiv:2403.09338.
- Zero-learning fast medical image fusion. In 2019 22th international conference on information fusion (FUSION), 1–8. IEEE.
- Vmamba: Visual state space model. arXiv preprint arXiv:2401.10166.
- U-mamba: Enhancing long-range dependency for biomedical image segmentation. arXiv preprint arXiv:2401.04722.
- SwinFusion: Cross-domain long-range learning for general image fusion via swin transformer. IEEE/CAA Journal of Automatica Sinica, 9(7): 1200–1217.
- Pansharpening by convolutional neural networks. Remote Sensing, 8(7): 594.
- Long range language modeling via gated state spaces. arXiv preprint arXiv:2206.13947.
- Simplified state space layers for sequence modeling. arXiv preprint arXiv:2208.04933.
- Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research, 15(1): 1929–1958.
- Fusion of satellite images of different spatial resolutions: Assessing the quality of resulting images. Photogrammetric Engineering and Remote Sensing, 63: 691–699.
- Deep Unfolded Network with Intrinsic Supervision for Pan-Sharpening. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 38, 5419–5426.
- A general image fusion framework using multi-task semi-supervised learning. Information Fusion, 108: 102414.
- Random shuffle transformer for image restoration. In International Conference on Machine Learning, 38039–38058. PMLR.
- EMFusion: An unsupervised enhanced medical image fusion network. Information Fusion, 76: 177–186.
- U2Fusion: A unified unsupervised image fusion network. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(1): 502–518.
- PanFlowNet: A Flow-Based Deep Network for Pan-sharpening. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 16857–16867.
- RS-Mamba for Large Remote Sensing Image Dense Prediction. arXiv:2404.02668.
- Cddfuse: Correlation-driven dual-branch feature decomposition for multi-modality image fusion. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 5906–5916.
- U-shaped Vision Mamba for Single Image Dehazing. arXiv preprint arXiv:2402.04139.
- Pan-sharpening with customized transformer and invertible neural network. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, 3553–3561.
- Spatial-frequency domain information integration for pan-sharpening. In European Conference on Computer Vision, 274–291. Springer.
- Vision mamba: Efficient visual representation learning with bidirectional state space model. arXiv preprint arXiv:2401.09417.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.