A Multi-scale Information Integration Framework for Infrared and Visible Image Fusion (2312.04328v2)
Abstract: Infrared and visible image fusion aims at generating a fused image containing the intensity and detail information of source images, and the key issue is effectively measuring and integrating the complementary information of multi-modality images from the same scene. Existing methods mostly adopt a simple weight in the loss function to decide the information retention of each modality rather than adaptively measuring complementary information for different image pairs. In this study, we propose a multi-scale dual attention (MDA) framework for infrared and visible image fusion, which is designed to measure and integrate complementary information in both structure and loss function at the image and patch level. In our method, the residual downsample block decomposes source images into three scales first. Then, dual attention fusion block integrates complementary information and generates a spatial and channel attention map at each scale for feature fusion. Finally, the output image is reconstructed by the residual reconstruction block. Loss function consists of image-level, feature-level and patch-level three parts, of which the calculation of the image-level and patch-level two parts are based on the weights generated by the complementary information measurement. Indeed, to constrain the pixel intensity distribution between the output and infrared image, a style loss is added. Our fusion results perform robust and informative across different scenarios. Qualitative and quantitative results on two datasets illustrate that our method is able to preserve both thermal radiation and detailed information from two modalities and achieve comparable results compared with the other state-of-the-art methods. Ablation experiments show the effectiveness of our information integration architecture and adaptively measure complementary information retention in the loss function.
- J. Ma, Y. Ma, and C. Li, “Infrared and visible image fusion methods and applications: A survey,” Information Fusion, vol. 45, pp. 153–178, 2019.
- X. Guo, R. Nie, J. Cao, D. Zhou, L. Mei, and K. He, “Fusegan: Learning to fuse multi-focus image via conditional generative adversarial network,” IEEE Transactions on Multimedia, vol. 21, no. 8, pp. 1982–1996, 2019.
- X. Zhang, P. Ye, and G. Xiao, “Vifb: A visible and infrared image fusion benchmark,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2020, pp. 104–105.
- L. Jian, X. Yang, Z. Liu, G. Jeon, M. Gao, and D. Chisholm, “Sedrfuse: A symmetric encoder–decoder with residual block network for infrared and visible image fusion,” IEEE Transactions on Instrumentation and Measurement, vol. 70, pp. 1–15, 2020.
- A. C. Muller and S. Narayanan, “Cognitively-engineered multisensor image fusion for military applications,” Information Fusion, vol. 10, no. 2, pp. 137–149, 2009.
- J. Ma and Y. Zhou, “Infrared and visible image fusion via gradientlet filter,” Computer Vision and Image Understanding, vol. 197, p. 103016, 2020.
- S. Li, X. Kang, L. Fang, J. Hu, and H. Yin, “Pixel-level image fusion: A survey of the state of the art,” Information Fusion, vol. 33, pp. 100–112, 2017.
- X. Luo, Y. Gao, A. Wang, Z. Zhang, and X.-J. Wu, “Ifsepr: A general framework for image fusion based on separate representation learning,” IEEE Transactions on Multimedia, 2021.
- Y. Liu, X. Chen, Z. Wang, Z. J. Wang, R. K. Ward, and X. Wang, “Deep learning for pixel-level image fusion: Recent advances and future prospects,” Information Fusion, vol. 42, pp. 158–173, 2018.
- H. Zhang, H. Xu, X. Tian, J. Jiang, and J. Ma, “Image fusion meets deep learning: A survey and perspective,” Information Fusion, vol. 76, pp. 323–336, 2021.
- J. Ma, W. Yu, P. Liang, C. Li, and J. Jiang, “Fusiongan: A generative adversarial network for infrared and visible image fusion,” Information Fusion, vol. 48, pp. 11–26, 2019.
- H. Li and X.-J. Wu, “Densefuse: A fusion approach to infrared and visible images,” IEEE Transactions on Image Processing, vol. 28, no. 5, pp. 2614–2623, 2018.
- H. Zhang, H. Xu, Y. Xiao, X. Guo, and J. Ma, “Rethinking the image fusion: A fast unified image fusion network based on proportional maintenance of gradient and intensity,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 07, 2020, pp. 12 797–12 804.
- F. Zhao and W. Zhao, “Learning specific and general realm feature representations for image fusion,” IEEE Transactions on Multimedia, vol. 23, pp. 2745–2756, 2020.
- J. Ma, H. Zhang, Z. Shao, P. Liang, and H. Xu, “Ganmcc: A generative adversarial network with multiclassification constraints for infrared and visible image fusion,” IEEE Transactions on Instrumentation and Measurement, vol. 70, pp. 1–14, 2020.
- B. Meher, S. Agrawal, R. Panda, and A. Abraham, “A survey on region based image fusion methods,” Information Fusion, vol. 48, pp. 119–132, 2019.
- H. Lin, Y. Tian, R. Pu, and L. Liang, “Remotely sensing image fusion based on wavelet transform and human vision system,” International Journal of Signal Processing, Image Processing and Pattern Recognition, vol. 8, no. 7, pp. 291–298, 2015.
- S. W. Zamir, A. Arora, S. Khan, M. Hayat, F. S. Khan, M.-H. Yang, and L. Shao, “Learning enriched features for real image restoration and enhancement,” in European Conference on Computer Vision. Springer, 2020, pp. 492–511.
- J. Hu, L. Shen, and G. Sun, “Squeeze-and-excitation networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 7132–7141.
- L. Chen, H. Zhang, J. Xiao, L. Nie, J. Shao, W. Liu, and T.-S. Chua, “Sca-cnn: Spatial and channel-wise attention in convolutional networks for image captioning,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 5659–5667.
- H. Li, X.-J. Wu, and J. Kittler, “Infrared and visible image fusion using a deep learning framework,” in 2018 24th International Conference on Pattern Recognition (ICPR). IEEE, 2018, pp. 2705–2710.
- H. Li, X.-J. Wu, and T. Durrani, “Nestfuse: An infrared and visible image fusion architecture based on nest connection and spatial/channel attention models,” IEEE Transactions on Instrumentation and Measurement, vol. 69, no. 12, pp. 9645–9656, 2020.
- H. Li, X.-J. Wu, and J. Kittler, “Rfn-nest: An end-to-end residual fusion network for infrared and visible images,” Information Fusion, vol. 73, pp. 72–86, 2021.
- P. Li, “Didfuse: deep image decomposition for infrared and visible image fusion,” in Proceedings of the Twenty-Ninth International Conference on International Joint Conferences on Artificial Intelligence, 2021, pp. 976–976.
- Y. Long, H. Jia, Y. Zhong, Y. Jiang, and Y. Jia, “Rxdnfuse: A aggregated residual dense network for infrared and visible image fusion,” Information Fusion, vol. 69, pp. 128–141, 2021.
- J. Ma, H. Xu, J. Jiang, X. Mei, and X.-P. Zhang, “Ddcgan: A dual-discriminator conditional generative adversarial network for multi-resolution image fusion,” IEEE Transactions on Image Processing, vol. 29, pp. 4980–4995, 2020.
- H. Zhou, W. Wu, Y. Zhang, J. Ma, and H. Ling, “Semantic-supervised infrared and visible image fusion via a dual-discriminator generative adversarial network,” IEEE Transactions on Multimedia, 2021.
- J. Li, H. Huo, C. Li, R. Wang, and Q. Feng, “Attentionfgan: Infrared and visible image fusion using attention-based generative adversarial networks,” IEEE Transactions on Multimedia, vol. 23, pp. 1383–1396, 2020.
- J. Liu, X. Fan, J. Jiang, R. Liu, and Z. Luo, “Learning a deep multi-scale feature ensemble and an edge-attention guidance for image fusion,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 32, no. 1, pp. 105–119, 2021.
- W. Tang, F. He, and Y. Liu, “Ydtr: Infrared and visible image fusion via y-shape dynamic transformer,” IEEE Transactions on Multimedia, 2022.
- J. L. Di Wang, X. Fan, and R. Liu, “Unsupervised misaligned infrared and visible image fusion via cross-modality image generation and registration,” pp. 3508–3515, 2022.
- R. Hou, D. Zhou, R. Nie, D. Liu, L. Xiong, Y. Guo, and C. Yu, “Vif-net: an unsupervised framework for infrared and visible image fusion,” IEEE Transactions on Computational Imaging, vol. 6, pp. 640–651, 2020.
- Z. Zhao, S. Xu, J. Zhang, C. Liang, C. Zhang, and J. Liu, “Efficient and model-based infrared and visible image fusion via algorithm unrolling,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 32, no. 3, pp. 1186–1196, 2021.
- H. Xu, J. Ma, Z. Le, J. Jiang, and X. Guo, “Fusiondn: A unified densely connected network for image fusion,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 07, 2020, pp. 12 484–12 491.
- H. Xu, J. Ma, J. Jiang, X. Guo, and H. Ling, “U2fusion: A unified unsupervised image fusion network,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 1, pp. 502–518, 2020.
- H. Xu and J. Ma, “Emfusion: An unsupervised enhanced medical image fusion network,” Information Fusion, vol. 76, pp. 177–186, 2021.
- H. Xu, H. Zhang, and J. Ma, “Classification saliency-based rule for visible and infrared image fusion,” IEEE Transactions on Computational Imaging, vol. 7, pp. 824–836, 2021.
- H. Xu, X. Wang, and J. Ma, “Drf: Disentangled representation for visible and infrared image fusion,” IEEE Transactions on Instrumentation and Measurement, vol. 70, pp. 1–13, 2021.
- H. Xu, M. Gong, X. Tian, J. Huang, and J. Ma, “Cufd: An encoder–decoder network for visible and infrared image fusion based on common and unique feature decomposition,” Computer Vision and Image Understanding, vol. 218, p. 103407, 2022.
- J. Ma, L. Tang, M. Xu, H. Zhang, and G. Xiao, “Stdfusionnet: An infrared and visible image fusion network based on salient target detection,” IEEE Transactions on Instrumentation and Measurement, vol. 70, pp. 1–13, 2021.
- Y. Yang, J. Liu, S. Huang, W. Wan, W. Wen, and J. Guan, “Infrared and visible image fusion via texture conditional generative adversarial network,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 31, no. 12, pp. 4771–4783, 2021.
- Y. Pang, X. Zhao, L. Zhang, and H. Lu, “Multi-scale interactive network for salient object detection,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 9413–9422.
- X. Li, W. Wang, X. Hu, and J. Yang, “Selective kernel networks,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 510–519.
- S. Woo, J. Park, J.-Y. Lee, and I. S. Kweon, “Cbam: Convolutional block attention module,” in Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 3–19.
- X. Deng, Y. Zhang, M. Xu, S. Gu, and Y. Duan, “Deep coupled feedback network for joint exposure fusion and image super-resolution,” IEEE Transactions on Image Processing, vol. 30, pp. 3098–3112, 2021.
- Y. Li, N. Wang, J. Liu, and X. Hou, “Demystifying neural style transfer,” arXiv preprint arXiv:1701.01036, 2017.
- J. Johnson, A. Alahi, and L. Fei-Fei, “Perceptual losses for real-time style transfer and super-resolution,” in European Conference on Computer Vision. Springer, 2016, pp. 694–711.
- K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
- J. W. Roberts, J. A. Van Aardt, and F. B. Ahmed, “Assessment of image fusion procedures using entropy, image quality, and multispectral classification,” Journal of Applied Remote Sensing, vol. 2, no. 1, p. 023522, 2008.
- Y. Fang, Y. Zeng, W. Jiang, H. Zhu, and J. Yan, “Superpixel-based quality assessment of multi-exposure image fusion for both static and dynamic scenes,” IEEE Transactions on Image Processing, vol. 30, pp. 2526–2537, 2021.
- Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: from error visibility to structural similarity,” IEEE Transactions on Image Processing, vol. 13, no. 4, pp. 600–612, 2004.
- L. A. Gatys, A. S. Ecker, and M. Bethge, “Image style transfer using convolutional neural networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 2414–2423.
- S. Hwang, J. Park, N. Kim, Y. Choi, and I. So Kweon, “Multispectral pedestrian detection: Benchmark dataset and baseline,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 1037–1045.
- A. Toet et al., “Tno image fusion dataset,” Figshare. data, 2014.
- Y. Han, Y. Cai, Y. Cao, and X. Xu, “A new image fusion performance metric based on visual information fidelity,” Information Fusion, vol. 14, no. 2, pp. 127–135, 2013.
- V. Aslantas and E. Bendes, “A new image quality metric for image fusion: The sum of the correlations of differences,” Aeu-international Journal of Electronics and Communications, vol. 69, no. 12, pp. 1890–1896, 2015.
- G. Cui, H. Feng, Z. Xu, Q. Li, and Y. Chen, “Detail preserved fusion of visible and infrared images using regional saliency extraction and multi-scale image decomposition,” Optics Communications, vol. 341, pp. 199–209, 2015.
- V. Petrović and C. Xydeas, “Evaluation of image fusion performance with visible differences,” in European Conference on Computer Vision. Springer, 2004, pp. 380–391.
- Guang Yang (422 papers)
- Jie Li (553 papers)
- Hanxiao Lei (1 paper)
- Xinbo Gao (194 papers)