Papers
Topics
Authors
Recent
Search
2000 character limit reached

FusionMamba: Dynamic Feature Enhancement for Multimodal Image Fusion with Mamba

Published 15 Apr 2024 in cs.CV | (2404.09498v3)

Abstract: Multimodal image fusion aims to integrate information from different imaging techniques to produce a comprehensive, detail-rich single image for downstream vision tasks. Existing methods based on local convolutional neural networks (CNNs) struggle to capture global features efficiently, while Transformer-based models are computationally expensive, although they excel at global modeling. Mamba addresses these limitations by leveraging selective structured state space models (S4) to effectively handle long-range dependencies while maintaining linear complexity. In this paper, we propose FusionMamba, a novel dynamic feature enhancement framework that aims to overcome the challenges faced by CNNs and Vision Transformers (ViTs) in computer vision tasks. The framework improves the visual state-space model Mamba by integrating dynamic convolution and channel attention mechanisms, which not only retains its powerful global feature modeling capability, but also greatly reduces redundancy and enhances the expressiveness of local features. In addition, we have developed a new module called the dynamic feature fusion module (DFFM). It combines the dynamic feature enhancement module (DFEM) for texture enhancement and disparity perception with the cross-modal fusion Mamba module (CMFM), which focuses on enhancing the inter-modal correlation while suppressing redundant information. Experiments show that FusionMamba achieves state-of-the-art performance in a variety of multimodal image fusion tasks as well as downstream experiments, demonstrating its broad applicability and superiority.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (35)
  1. Hao Chen and Pramod K Varshney. 2007. A human perception inspired quality metric for image fusion based on regional information. Information fusion 8, 2 (2007), 193–207.
  2. A similarity metric for assessment of image fusion algorithms. International journal of signal processing 2, 3 (2005), 178–182.
  3. Mutual-guided dynamic network for image fusion. In Proceedings of the 31st ACM International Conference on Multimedia. 1779–1788.
  4. MambaIR: A Simple Baseline for Image Restoration with State-Space Model. arXiv preprint arXiv:2402.15648 (2024).
  5. Multispectral pedestrian detection: Benchmark dataset and baseline. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1037–1045.
  6. High-throughput protein localization in Arabidopsis using Agrobacterium-mediated transient expression of GFP-ORF fusions. The Plant Journal 41, 1 (2005), 162–174.
  7. Hui Li and Xiao-Jun Wu. 2018. DenseFuse: A fusion approach to infrared and visible images. IEEE Transactions on Image Processing 28, 5 (2018), 2614–2623.
  8. NestFuse: An infrared and visible image fusion architecture based on nest connection and spatial/channel attention models. IEEE Transactions on Instrumentation and Measurement 69, 12 (2020), 9645–9656.
  9. RFN-Nest: An end-to-end residual fusion network for infrared and visible images. Information Fusion 73 (2021), 72–86.
  10. Medical image fusion via convolutional sparsity based morphological component analysis. IEEE Signal Processing Letters 26, 3 (2019), 485–489.
  11. Vmamba: Visual state space model. arXiv preprint arXiv:2401.10166 (2024).
  12. SwinFusion: Cross-domain long-range learning for general image fusion via swin transformer. IEEE/CAA Journal of Automatica Sinica 9, 7 (2022), 1200–1217.
  13. DDcGAN: A dual-discriminator conditional generative adversarial network for multi-resolution image fusion. IEEE Transactions on Image Processing 29 (2020), 4980–4995.
  14. FusionGAN: A generative adversarial network for infrared and visible image fusion. Information fusion 48 (2019), 11–26.
  15. EfficientVMamba: Atrous Selective Scan for Light Weight Visual Mamba. arXiv preprint arXiv:2403.09977 (2024).
  16. Gemma Piella and Henk Heijmans. 2003. A new quality metric for image fusion. In Proceedings 2003 international conference on image processing (Cat. No. 03CH37429), Vol. 3. IEEE, III–173.
  17. Deepfuse: A deep unsupervised approach for exposure fusion with extreme exposure image pairs. In Proceedings of the IEEE international conference on computer vision. 4714–4722.
  18. U-net: Convolutional networks for biomedical image segmentation. In Medical image computing and computer-assisted intervention–MICCAI 2015: 18th international conference, Munich, Germany, October 5-9, 2015, proceedings, part III 18. Springer, 234–241.
  19. Jiacheng Ruan and Suncheng Xiang. 2024. Vm-unet: Vision mamba unet for medical image segmentation. arXiv preprint arXiv:2402.02491 (2024).
  20. DIVFusion: Darkness-free infrared and visible image fusion. Information Fusion 91 (2023), 477–493.
  21. MATR: Multimodal medical image fusion via multiscale adaptive transformer. IEEE Transactions on Image Processing 31 (2022), 5134–5149.
  22. DATFuse: Infrared and visible image fusion via dual attention transformer. IEEE Transactions on Circuits and Systems for Video Technology (2023).
  23. Image fusion transformer. In 2022 IEEE International conference on image processing (ICIP). IEEE, 3566–3570.
  24. MRSCFusion: Joint Residual Swin Transformer and Multiscale CNN for Unsupervised Multimodal Medical Image Fusion. IEEE Transactions on Instrumentation and Measurement (2023).
  25. U2Fusion: A unified unsupervised image fusion network. IEEE Transactions on Pattern Analysis and Machine Intelligence 44, 1 (2020), 502–518.
  26. Fusiondn: A unified densely connected network for image fusion. In Proceedings of the AAAI conference on artificial intelligence, Vol. 34. 12484–12491.
  27. CAT: Enhancing Multimodal Large Language Model to Answer Questions in Dynamic Audio-Visual Scenarios. arXiv preprint arXiv:2403.04640 (2024).
  28. Rethinking vision transformer and masked autoencoder in multimodal face anti-spoofing. arXiv preprint arXiv:2302.05744 (2023).
  29. Searching multi-rate and multi-modal temporal enhanced networks for gesture recognition. IEEE Transactions on Image Processing 30 (2021), 5626–5640.
  30. Hao Zhang and Jiayi Ma. 2021. SDNet: A versatile squeeze-and-decomposition network for real-time image fusion. International Journal of Computer Vision 129, 10 (2021), 2761–2785.
  31. Xingchen Zhang and Yiannis Demiris. 2023. Visible and infrared image fusion using deep learning. IEEE Transactions on Pattern Analysis and Machine Intelligence (2023).
  32. IFCNN: A general image fusion framework based on convolutional neural network. Information Fusion 54 (2020), 99–118.
  33. Metafusion: Infrared and visible image fusion via meta-feature embedding from object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 13955–13965.
  34. Cddfuse: Correlation-driven dual-branch feature decomposition for multi-modality image fusion. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 5906–5916.
  35. Deep learning methods for medical image fusion: A review. Computers in Biology and Medicine (2023), 106959.
Citations (8)

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 0 likes about this paper.