Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

MMA-UNet: A Multi-Modal Asymmetric UNet Architecture for Infrared and Visible Image Fusion (2404.17747v2)

Published 27 Apr 2024 in cs.CV

Abstract: Multi-modal image fusion (MMIF) maps useful information from various modalities into the same representation space, thereby producing an informative fused image. However, the existing fusion algorithms tend to symmetrically fuse the multi-modal images, causing the loss of shallow information or bias towards a single modality in certain regions of the fusion results. In this study, we analyzed the spatial distribution differences of information in different modalities and proved that encoding features within the same network is not conducive to achieving simultaneous deep feature space alignment for multi-modal images. To overcome this issue, a Multi-Modal Asymmetric UNet (MMA-UNet) was proposed. We separately trained specialized feature encoders for different modal and implemented a cross-scale fusion strategy to maintain the features from different modalities within the same representation space, ensuring a balanced information fusion process. Furthermore, extensive fusion and downstream task experiments were conducted to demonstrate the efficiency of MMA-UNet in fusing infrared and visible image information, producing visually natural and semantically rich fusion results. Its performance surpasses that of the state-of-the-art comparison fusion methods.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (56)
  1. End-to-end object detection with transformers. In European conference on computer vision. Springer, 213–229.
  2. AFT: Adaptive fusion transformer for visible and infrared images. IEEE Transactions on Image Processing 32 (2023), 2077–2092.
  3. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European conference on computer vision (ECCV). 801–818.
  4. MUFusion: A general unsupervised image fusion network based on memory unit. Information Fusion 92 (2023), 80–92.
  5. MFNet: Towards real-time semantic segmentation for autonomous vehicles with multi-spectral scenes. In 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 5108–5115.
  6. Squeeze-and-excitation networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 7132–7141.
  7. Reconet: Recurrent correction network for fast and efficient multi-modality image fusion. In European conference on computer Vision. Springer, 539–555.
  8. UIFGAN: An unsupervised continual-learning generative adversarial network for unified image fusion. Information Fusion 88 (2022), 305–318.
  9. MDLatLRR: A novel decomposition method for infrared and visible image fusion. IEEE Transactions on Image Processing 29 (2020), 4733–4746.
  10. Lrrnet: A novel representation learning guided fusion network for infrared and visible images. IEEE transactions on pattern analysis and machine intelligence (2023).
  11. Learning a graph neural network with cross modality interaction for image fusion. In Proceedings of the 31st ACM International Conference on Multimedia. 4471–4479.
  12. Mseg3d: Multi-modal 3d semantic segmentation for autonomous driving. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 21694–21704.
  13. Bridging the gap between multi-focus and multi-modal: a focused integration framework for multi-modal image fusion. In Proceedings of the IEEE/CVF winter conference on applications of computer vision. 1628–1637.
  14. Fusion from decomposition: A self-supervised decomposition approach for image fusion. In European Conference on Computer Vision. Springer, 719–735.
  15. Target-aware dual adversarial learning and a multi-scenario multi-modality benchmark to fuse infrared and visible for object detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 5802–5811.
  16. Learning a deep multi-scale feature ensemble and an edge-attention guidance for image fusion. IEEE Transactions on Circuits and Systems for Video Technology 32, 1 (2021), 105–119.
  17. Multi-interactive feature learning and a full-time multi-modality benchmark for image fusion and segmentation. In Proceedings of the IEEE/CVF international conference on computer vision. 8115–8124.
  18. Bi-level dynamic learning for jointly multi-modality image fusion and beyond. arXiv preprint arXiv:2305.06720 (2023).
  19. PAIF: Perception-aware infrared-visible image fusion for attack-tolerant semantic segmentation. In Proceedings of the 31st ACM International Conference on Multimedia. 3706–3714.
  20. IFSepR: A general framework for image fusion based on separate representation learning. IEEE Transactions on Multimedia 25 (2021), 608–623.
  21. Infrared and visible image fusion methods and applications: A survey. Information fusion 45 (2019), 153–178.
  22. SwinFusion: Cross-domain long-range learning for general image fusion via swin transformer. IEEE/CAA Journal of Automatica Sinica 9, 7 (2022), 1200–1217.
  23. DDcGAN: A dual-discriminator conditional generative adversarial network for multi-resolution image fusion. IEEE Transactions on Image Processing 29 (2020), 4980–4995.
  24. FusionGAN: A generative adversarial network for infrared and visible image fusion. Information fusion 48 (2019), 11–26.
  25. GANMcC: A generative adversarial network with multiclassification constraints for infrared and visible image fusion. IEEE Transactions on Instrumentation and Measurement 70 (2020), 1–14.
  26. Do wide and deep networks learn the same things? uncovering how neural network representations vary with width and depth. arXiv preprint arXiv:2010.15327 (2020).
  27. Bilateral attention decoder: A lightweight decoder for real-time semantic segmentation. Neural Networks 137 (2021), 188–199.
  28. Towards robust monocular depth estimation: Mixing datasets for zero-shot cross-dataset transfer. IEEE transactions on pattern analysis and machine intelligence 44, 3 (2020), 1623–1637.
  29. TGFuse: An infrared and visible image fusion approach based on transformer and generative adversarial network. IEEE Transactions on Image Processing (2023).
  30. AT-GAN: A generative adversarial network with attention and transition for infrared and visible image fusion. Information Fusion 92 (2023), 336–349.
  31. U-net: Convolutional networks for biomedical image segmentation. In Medical image computing and computer-assisted intervention–MICCAI 2015: 18th international conference, Munich, Germany, October 5-9, 2015, proceedings, part III 18. Springer, 234–241.
  32. SuperFusion: A versatile image registration and fusion network with semantic awareness. IEEE/CAA Journal of Automatica Sinica 9, 12 (2022), 2121–2137.
  33. DIVFusion: Darkness-free infrared and visible image fusion. Information Fusion 91 (2023), 477–493.
  34. Image fusion in the loop of high-level vision tasks: A semantic-aware real-time infrared and visible image fusion network. Information Fusion 82 (2022), 28–42.
  35. PIAFusion: A progressive infrared and visible image fusion network based on illumination aware. Information Fusion 83 (2022), 79–92.
  36. YDTR: Infrared and visible image fusion via Y-shape dynamic transformer. IEEE Transactions on Multimedia (2022).
  37. DATFuse: Infrared and visible image fusion via dual attention transformer. IEEE Transactions on Circuits and Systems for Video Technology (2023).
  38. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 7464–7475.
  39. Unsupervised misaligned infrared and visible image fusion via cross-modality image generation and registration. arXiv preprint arXiv:2205.11876 (2022).
  40. An interactively reinforced paradigm for joint infrared-visible image fusion and saliency object detection. Information Fusion 98 (2023), 101828.
  41. Cliffnet for monocular depth estimation with hierarchical embedding loss. In European Conference on Computer Vision. Springer, 316–331.
  42. Cbam: Convolutional block attention module. In Proceedings of the European conference on computer vision (ECCV). 3–19.
  43. Semantics lead all: Towards unified image registration and fusion from a semantic perspective. Information Fusion 98 (2023), 101835.
  44. Murf: Mutually reinforcing multi-modal image registration and fusion. IEEE transactions on pattern analysis and machine intelligence (2023).
  45. Objective image fusion performance measure. Electronics letters 36, 4 (2000), 308–309.
  46. Text-IF: Leveraging Semantic Text Guidance for Degradation-Aware and Interactive Image Fusion. arXiv preprint arXiv:2403.16387 (2024).
  47. Dif-fusion: Towards high color fidelity in infrared and visible image fusion with diffusion models. IEEE Transactions on Image Processing (2023).
  48. Xingchen Zhang and Yiannis Demiris. 2023. Visible and infrared image fusion using deep learning. IEEE Transactions on Pattern Analysis and Machine Intelligence (2023).
  49. IFCNN: A general image fusion framework based on convolutional neural network. Information Fusion 54 (2020), 99–118.
  50. Metafusion: Infrared and visible image fusion via meta-feature embedding from object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 13955–13965.
  51. Cddfuse: Correlation-driven dual-branch feature decomposition for multi-modality image fusion. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 5906–5916.
  52. Equivariant multi-modality image fusion. arXiv preprint arXiv:2305.11443 (2023).
  53. DDFM: denoising diffusion model for multi-modality image fusion. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 8082–8093.
  54. Bayesian fusion for infrared and visible images. Signal Processing 177 (2020), 107734.
  55. A perceptual framework for infrared–visible image fusion based on multiscale structure decomposition and biological vision. Information Fusion 93 (2023), 174–191.
  56. Task-Customized Mixture of Adapters for General Image Fusion. arXiv preprint arXiv:2403.12494 (2024).

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com