Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

MambaDFuse: A Mamba-based Dual-phase Model for Multi-modality Image Fusion (2404.08406v1)

Published 12 Apr 2024 in cs.CV

Abstract: Multi-modality image fusion (MMIF) aims to integrate complementary information from different modalities into a single fused image to represent the imaging scene and facilitate downstream visual tasks comprehensively. In recent years, significant progress has been made in MMIF tasks due to advances in deep neural networks. However, existing methods cannot effectively and efficiently extract modality-specific and modality-fused features constrained by the inherent local reductive bias (CNN) or quadratic computational complexity (Transformers). To overcome this issue, we propose a Mamba-based Dual-phase Fusion (MambaDFuse) model. Firstly, a dual-level feature extractor is designed to capture long-range features from single-modality images by extracting low and high-level features from CNN and Mamba blocks. Then, a dual-phase feature fusion module is proposed to obtain fusion features that combine complementary information from different modalities. It uses the channel exchange method for shallow fusion and the enhanced Multi-modal Mamba (M3) blocks for deep fusion. Finally, the fused image reconstruction module utilizes the inverse transformation of the feature extraction to generate the fused result. Through extensive experiments, our approach achieves promising fusion results in infrared-visible image fusion and medical image fusion. Additionally, in a unified benchmark, MambaDFuse has also demonstrated improved performance in downstream tasks such as object detection. Code with checkpoints will be available after the peer-review process.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (61)
  1. Adaptive near-infrared and visible fusion for fast image enhancement. IEEE Transactions on Computational Imaging 6 (2019), 408–418.
  2. Wele Gedara Chaminda Bandara and Vishal M Patel. 2022. HyperTransformer: A textural and spectral feature fusion transformer for pansharpening. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 1767–1777.
  3. Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020).
  4. Changer: Feature interaction is what you need for change detection. IEEE Transactions on Geoscience and Remote Sensing (2023).
  5. Hungry hungry hippos: Towards language modeling with state space models. arXiv preprint arXiv:2212.14052 (2022).
  6. Albert Gu and Tri Dao. 2023. Mamba: Linear-time sequence modeling with selective state spaces. arXiv preprint arXiv:2312.00752 (2023).
  7. Efficiently modeling long sequences with structured state spaces. arXiv preprint arXiv:2111.00396 (2021).
  8. Combining recurrent, convolutional, and continuous-time models with linear state space layers. Advances in neural information processing systems 34 (2021), 572–585.
  9. Harvard medical website. [n. d.]. Harvard medical website. http://www.med.harvard.edu/AANLIB/home.html.
  10. Hqg-net: Unpaired medical image enhancement with high-quality guidance. IEEE Transactions on Neural Networks and Learning Systems (2023).
  11. Pan-Mamba: Effective pan-sharpening with State Space Model. arXiv preprint arXiv:2402.12192 (2024).
  12. Efficient movie scene detection using state-space transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 18749–18758.
  13. Alex Pappachen James and Belur V Dasarathy. 2014. Medical image fusion: A survey of the state of the art. Information fusion 19 (2014), 4–19.
  14. Rudolph Emil Kalman. 1960. A new approach to linear filtering and prediction problems. (1960).
  15. Hui Li and Xiao-Jun Wu. 2018. DenseFuse: A fusion approach to infrared and visible images. IEEE Transactions on Image Processing 28, 5 (2018), 2614–2623.
  16. NestFuse: An infrared and visible image fusion architecture based on nest connection and spatial/channel attention models. IEEE Transactions on Instrumentation and Measurement 69, 12 (2020), 9645–9656.
  17. RFN-Nest: An end-to-end residual fusion network for infrared and visible images. Information Fusion 73 (2021), 72–86.
  18. Fusion from decomposition: A self-supervised decomposition approach for image fusion. In European Conference on Computer Vision. Springer, 719–735.
  19. Perceptual-sensitive gan for generating adversarial patches. In Proceedings of the AAAI conference on artificial intelligence, Vol. 33. 1028–1035.
  20. Training robust deep neural networks via adversarial noise propagation. IEEE Transactions on Image Processing 30 (2021), 5769–5781.
  21. Target-aware dual adversarial learning and a multi-scenario multi-modality benchmark to fuse infrared and visible for object detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 5802–5811.
  22. Learning a deep multi-scale feature ensemble and an edge-attention guidance for image fusion. IEEE Transactions on Circuits and Systems for Video Technology 32, 1 (2021), 105–119.
  23. Searching a hierarchically aggregated fusion architecture for fast multi-modality image fusion. In Proceedings of the 29th ACM International Conference on Multimedia. 1600–1608.
  24. Vmamba: Visual state space model. arXiv preprint arXiv:2401.10166 (2024).
  25. U-mamba: Enhancing long-range dependency for biomedical image segmentation. arXiv preprint arXiv:2401.04722 (2024).
  26. SwinFusion: Cross-domain long-range learning for general image fusion via swin transformer. IEEE/CAA Journal of Automatica Sinica 9, 7 (2022), 1200–1217.
  27. STDFusionNet: An infrared and visible image fusion network based on salient target detection. IEEE Transactions on Instrumentation and Measurement 70 (2021), 1–13.
  28. DDcGAN: A dual-discriminator conditional generative adversarial network for multi-resolution image fusion. IEEE Transactions on Image Processing 29 (2020), 4980–4995.
  29. FusionGAN: A generative adversarial network for infrared and visible image fusion. Information fusion 48 (2019), 11–26.
  30. GANMcC: A generative adversarial network with multiclassification constraints for infrared and visible image fusion. IEEE Transactions on Instrumentation and Measurement 70 (2020), 1–14.
  31. Multi-exposure image fusion by optimizing a structural similarity index. IEEE Transactions on Computational Imaging 4, 1 (2017), 60–72.
  32. S4nd: Modeling images and videos as multidimensional signals with state spaces. Advances in neural information processing systems 35 (2022), 2846–2861.
  33. Bibert: Accurate fully binarized bert. arXiv preprint arXiv:2203.06390 (2022).
  34. Forward and backward information retention for accurate binary neural networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2250–2259.
  35. Distribution-sensitive information retention for accurate binary neural network. International Journal of Computer Vision 131, 1 (2023), 26–47.
  36. Basnet: Boundary-aware salient object detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 7479–7489.
  37. Transmef: A transformer-based multi-exposure image fusion framework using self-supervised multi-task learning. In Proceedings of the AAAI conference on artificial intelligence, Vol. 36. 2126–2134.
  38. You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition. 779–788.
  39. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision. 618–626.
  40. Simplified state space layers for sequence modeling. arXiv preprint arXiv:2208.04933 (2022).
  41. PIAFusion: A progressive infrared and visible image fusion network based on illumination aware. Information Fusion 83 (2022), 79–92.
  42. Robustart: Benchmarking robustness on architecture design and training techniques. arXiv preprint arXiv:2109.05211 (2021).
  43. Dual attention suppression attack: Generate adversarial camouflage in physical world. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 8565–8574.
  44. Selective structured state-spaces for long-form video understanding. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 6387–6397.
  45. Early convolutions help transformers see better. Advances in neural information processing systems 34 (2021), 30392–30400.
  46. Han Xu and Jiayi Ma. 2021. EMFusion: An unsupervised enhanced medical image fusion network. Information Fusion 76 (2021), 177–186.
  47. U2Fusion: A unified unsupervised image fusion network. IEEE Transactions on Pattern Analysis and Machine Intelligence 44, 1 (2020), 502–518.
  48. Fusiondn: A unified densely connected network for image fusion. In Proceedings of the AAAI conference on artificial intelligence, Vol. 34. 12484–12491.
  49. Stereo video super-resolution via exploiting view-temporal correlations. In Proceedings of the 29th ACM International Conference on Multimedia. 460–468.
  50. Deep gradient projection networks for pan-sharpening. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1366–1375.
  51. Sir-former: Stereo image restoration using transformer. In Proceedings of the 30th ACM International Conference on Multimedia. 6377–6385.
  52. Spectral-depth imaging with deep learning based reconstruction. Optics express 27, 26 (2019), 38312–38325.
  53. Dif-fusion: Towards high color fidelity in infrared and visible image fusion with diffusion models. IEEE Transactions on Image Processing (2023).
  54. Hao Zhang and Jiayi Ma. 2021. SDNet: A versatile squeeze-and-decomposition network for real-time image fusion. International Journal of Computer Vision 129, 10 (2021), 2761–2785.
  55. Xingchen Zhang. 2021. Deep learning-based multi-focus image fusion: A survey and a comparative study. IEEE Transactions on Pattern Analysis and Machine Intelligence 44, 9 (2021), 4819–4838.
  56. Cddfuse: Correlation-driven dual-branch feature decomposition for multi-modality image fusion. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 5906–5916.
  57. DDFM: denoising diffusion model for multi-modality image fusion. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 8082–8093.
  58. DIDFuse: Deep image decomposition for infrared and visible image fusion. arXiv preprint arXiv:2003.09210 (2020).
  59. Bayesian fusion for infrared and visible images. Signal Processing 177 (2020), 107734.
  60. FGF-GAN: A lightweight generative adversarial network for pansharpening via fast guided filter. In 2021 IEEE International Conference on Multimedia and Expo (ICME). IEEE, 1–6.
  61. Vision mamba: Efficient visual representation learning with bidirectional state space model. arXiv preprint arXiv:2401.09417 (2024).
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Zhe Li (210 papers)
  2. Haiwei Pan (5 papers)
  3. Kejia Zhang (15 papers)
  4. Yuhua Wang (4 papers)
  5. Fengming Yu (1 paper)
Citations (10)
X Twitter Logo Streamline Icon: https://streamlinehq.com