Papers
Topics
Authors
Recent
Search
2000 character limit reached

Shuffle Mamba: State Space Models with Random Shuffle for Multi-Modal Image Fusion

Published 3 Sep 2024 in cs.CV | (2409.01728v1)

Abstract: Multi-modal image fusion integrates complementary information from different modalities to produce enhanced and informative images. Although State-Space Models, such as Mamba, are proficient in long-range modeling with linear complexity, most Mamba-based approaches use fixed scanning strategies, which can introduce biased prior information. To mitigate this issue, we propose a novel Bayesian-inspired scanning strategy called Random Shuffle, supplemented by an theoretically-feasible inverse shuffle to maintain information coordination invariance, aiming to eliminate biases associated with fixed sequence scanning. Based on this transformation pair, we customized the Shuffle Mamba Framework, penetrating modality-aware information representation and cross-modality information interaction across spatial and channel axes to ensure robust interaction and an unbiased global receptive field for multi-modal image fusion. Furthermore, we develop a testing methodology based on Monte-Carlo averaging to ensure the model's output aligns more closely with expected results. Extensive experiments across multiple multi-modal image fusion tasks demonstrate the effectiveness of our proposed method, yielding excellent fusion quality over state-of-the-art alternatives. Code will be available upon acceptance.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (30)
  1. Comparison of Pansharpening Algorithms: Outcome of the 2006 GRS-S Data Fusion Contest. IEEE Transactions on Geoscience and Remote Sensing, 45(10): 3012–3021.
  2. RSMamba: Remote Sensing Image Classification with State Space Model. arXiv preprint arXiv:2403.19654.
  3. A multiscale residual pyramid attention network for medical image fusion. Biomedical Signal Processing and Control, 66: 102488.
  4. Mamba: Linear-time sequence modeling with selective state spaces. arXiv preprint arXiv:2312.00752.
  5. Efficiently modeling long sequences with structured state spaces. arXiv preprint arXiv:2111.00396.
  6. Pan-Mamba: Effective pan-sharpening with State Space Model. arXiv preprint arXiv:2402.12192.
  7. Frequency-Adaptive Pan-Sharpening with Mixture of Experts. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 38, 2121–2129.
  8. Multi-Scale Dual-Domain Guidance Network for Pan-sharpening. IEEE Transactions on Geoscience and Remote Sensing.
  9. Localmamba: Visual state space model with windowed selective scan. arXiv preprint arXiv:2403.09338.
  10. Zero-learning fast medical image fusion. In 2019 22th international conference on information fusion (FUSION), 1–8. IEEE.
  11. Vmamba: Visual state space model. arXiv preprint arXiv:2401.10166.
  12. U-mamba: Enhancing long-range dependency for biomedical image segmentation. arXiv preprint arXiv:2401.04722.
  13. SwinFusion: Cross-domain long-range learning for general image fusion via swin transformer. IEEE/CAA Journal of Automatica Sinica, 9(7): 1200–1217.
  14. Pansharpening by convolutional neural networks. Remote Sensing, 8(7): 594.
  15. Long range language modeling via gated state spaces. arXiv preprint arXiv:2206.13947.
  16. Simplified state space layers for sequence modeling. arXiv preprint arXiv:2208.04933.
  17. Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research, 15(1): 1929–1958.
  18. Fusion of satellite images of different spatial resolutions: Assessing the quality of resulting images. Photogrammetric Engineering and Remote Sensing, 63: 691–699.
  19. Deep Unfolded Network with Intrinsic Supervision for Pan-Sharpening. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 38, 5419–5426.
  20. A general image fusion framework using multi-task semi-supervised learning. Information Fusion, 108: 102414.
  21. Random shuffle transformer for image restoration. In International Conference on Machine Learning, 38039–38058. PMLR.
  22. EMFusion: An unsupervised enhanced medical image fusion network. Information Fusion, 76: 177–186.
  23. U2Fusion: A unified unsupervised image fusion network. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(1): 502–518.
  24. PanFlowNet: A Flow-Based Deep Network for Pan-sharpening. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 16857–16867.
  25. RS-Mamba for Large Remote Sensing Image Dense Prediction. arXiv:2404.02668.
  26. Cddfuse: Correlation-driven dual-branch feature decomposition for multi-modality image fusion. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 5906–5916.
  27. U-shaped Vision Mamba for Single Image Dehazing. arXiv preprint arXiv:2402.04139.
  28. Pan-sharpening with customized transformer and invertible neural network. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, 3553–3561.
  29. Spatial-frequency domain information integration for pan-sharpening. In European Conference on Computer Vision, 274–291. Springer.
  30. Vision mamba: Efficient visual representation learning with bidirectional state space model. arXiv preprint arXiv:2401.09417.

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.