Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
126 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

FuseFormer: A Transformer for Visual and Thermal Image Fusion (2402.00971v2)

Published 1 Feb 2024 in cs.CV

Abstract: Due to the lack of a definitive ground truth for the image fusion problem, the loss functions are structured based on evaluation metrics, such as the structural similarity index measure (SSIM). However, in doing so, a bias is introduced toward the SSIM and, consequently, the input visual band image. The objective of this study is to propose a novel methodology for the image fusion problem that mitigates the limitations associated with using classical evaluation metrics as loss functions. Our approach integrates a transformer-based multi-scale fusion strategy that adeptly addresses local and global context information. This integration not only refines the individual components of the image fusion process but also significantly enhances the overall efficacy of the method. Our proposed method follows a two-stage training approach, where an auto-encoder is initially trained to extract deep features at multiple scales in the first stage. For the second stage, we integrate our fusion block and change the loss function as mentioned. The multi-scale features are fused using a combination of Convolutional Neural Networks (CNNs) and Transformers. The CNNs are utilized to capture local features, while the Transformer handles the integration of general context features. Through extensive experiments on various benchmark datasets, our proposed method, along with the novel loss function definition, demonstrates superior performance compared to other competitive fusion algorithms.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (51)
  1. Z. Liu, E. Blasch, Z. Xue, J. Zhao, R. Laganiere, and W. Wu, “Objective assessment of multiresolution image fusion algorithms for context enhancement in night vision: a comparative study,” IEEE transactions on pattern analysis and machine intelligence, vol. 34, no. 1, pp. 94–109, 2011.
  2. S. Chaudhari, G. Deshmukh, S. Gokhale, R. Kadu, S. Jahirabadkar, and R. Aditya, “A survey of visible and infrared image fusion methodologies,” in Inventive Communication and Computational Technologies (G. Ranganathan, G. A. Papakostas, and Á. Rocha, eds.), (Singapore), pp. 1057–1070, Springer Nature Singapore, 2023.
  3. M. Belgiu and A. Stein, “Spatiotemporal image fusion in remote sensing,” Remote Sensing, vol. 11, no. 7, 2019.
  4. H. Hermessi, O. Mourali, and E. Zagrouba, “Multimodal medical image fusion review: Theoretical background and recent advances,” Signal Processing, vol. 183, p. 108036, 2021.
  5. A. Toet, L. J. Van Ruyven, and J. M. Valeton, “Merging thermal and visual images by a contrast pyramid,” Optical engineering, vol. 28, no. 7, pp. 789–792, 1989.
  6. K. Song, Y. Zhao, L. Huang, Y. Yan, and Q. Meng, “Rgb-t image analysis technology and application: A survey,” Engineering Applications of Artificial Intelligence, vol. 120, p. 105919, 2023.
  7. Y. Zhang, Y. Liu, P. Sun, H. Yan, X. Zhao, and L. Zhang, “Ifcnn: A general image fusion framework based on convolutional neural network,” Information Fusion, vol. 54, pp. 99–118, 2020.
  8. H. Li, X.-j. Wu, and T. S. Durrani, “Infrared and visible image fusion with resnet and zero-phase component analysis,” Infrared Physics & Technology, vol. 102, p. 103039, 2019.
  9. A. Raza, H. Huo, and T. Fang, “Pfaf-net: Pyramid feature network for multimodal fusion,” IEEE Sensors Letters, vol. 4, no. 12, pp. 1–4, 2020.
  10. Z. Zhao, S. Xu, J. Zhang, C. Liang, C. Zhang, and J. Liu, “Efficient and model-based infrared and visible image fusion via algorithm unrolling,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 32, no. 3, pp. 1186–1196, 2021.
  11. Y. Fu and X. Wu, “A dual-branch network for infrared and visible image fusion,” in 2020 25th International Conference on Pattern Recognition (ICPR), (Los Alamitos, CA, USA), pp. 10675–10680, IEEE Computer Society, jan 2021.
  12. L. Jian, X. Yang, Z. Liu, G. Jeon, M. Gao, and D. Chisholm, “Sedrfuse: A symmetric encoder–decoder with residual block network for infrared and visible image fusion,” IEEE Transactions on Instrumentation and Measurement, vol. 70, pp. 1–15, 2020.
  13. Z. Wang, Y. Wu, J. Wang, J. Xu, and W. Shao, “Res2fusion: Infrared and visible image fusion based on dense res2net and double nonlocal attention models,” IEEE Transactions on Instrumentation and Measurement, vol. 71, pp. 1–12, 2022.
  14. I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial networks,” in Advances in neural information processing systems, pp. 2672–2680, 2014.
  15. J. Ma, W. Yu, P. Liang, C. Li, and J. Jiang, “Fusiongan: A generative adversarial network for infrared and visible image fusion,” Information fusion, vol. 48, pp. 11–26, 2019.
  16. J. Xu, X. Shi, S. Qin, K. Lu, H. Wang, and J. Ma, “Lbp-began: A generative adversarial network architecture for infrared and visible image fusion,” Infrared Physics & Technology, vol. 104, p. 103144, 2020.
  17. D. Xu, Y. Wang, S. Xu, K. Zhu, N. Zhang, and X. Zhang, “Infrared and visible image fusion with a generative adversarial network and a residual network,” Applied Sciences, vol. 10, no. 2, p. 554, 2020.
  18. Y. Fu, X.-J. Wu, and T. Durrani, “Image fusion based on generative adversarial network consistent with perception,” Information Fusion, vol. 72, pp. 110–125, 2021.
  19. J. Ma, H. Zhang, Z. Shao, P. Liang, and H. Xu, “Ganmcc: A generative adversarial network with multiclassification constraints for infrared and visible image fusion,” IEEE Transactions on Instrumentation and Measurement, vol. 70, pp. 1–14, 2020.
  20. J. Liu, X. Fan, J. Jiang, R. Liu, and Z. Luo, “Learning a deep multi-scale feature ensemble and an edge-attention guidance for image fusion,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 32, no. 1, pp. 105–119, 2021.
  21. B. Liao, Y. Du, and X. Yin, “Fusion of infrared-visible images in ue-iot for fault point detection based on gan,” IEEE Access, vol. 8, pp. 79754–79763, 2020.
  22. A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, et al., “An image is worth 16x16 words: Transformers for image recognition at scale,” arXiv preprint arXiv:2010.11929, 2020.
  23. Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo, “Swin transformer: Hierarchical vision transformer using shifted windows,” in Proceedings of the IEEE/CVF international conference on computer vision, pp. 10012–10022, 2021.
  24. X. Liu, H. Gao, Q. Miao, Y. Xi, Y. Ai, and D. Gao, “Mfst: Multi-modal feature self-adaptive transformer for infrared and visible image fusion,” Remote Sensing, vol. 14, no. 13, p. 3233, 2022.
  25. V. Vs, J. M. J. Valanarasu, P. Oza, and V. M. Patel, “Image fusion transformer,” in 2022 IEEE International Conference on Image Processing (ICIP), pp. 3566–3570, IEEE, 2022.
  26. Y. Fu, T. Xu, X. Wu, and J. Kittler, “Ppt fusion: Pyramid patch transformerfor a case study in image fusion,” arXiv preprint arXiv:2107.13967, 2021.
  27. J. Ma, L. Tang, F. Fan, J. Huang, X. Mei, and Y. Ma, “Swinfusion: Cross-domain long-range learning for general image fusion via swin transformer,” IEEE/CAA Journal of Automatica Sinica, vol. 9, no. 7, pp. 1200–1217, 2022.
  28. L. Qu, S. Liu, M. Wang, S. Li, S. Yin, Q. Qiao, and Z. Song, “Transfuse: A unified transformer-based image fusion framework using self-supervised learning,” arXiv preprint arXiv:2201.07451, 2022.
  29. H. Zhao and R. Nie, “Dndt: Infrared and visible image fusion via densenet and dual-transformer,” in 2021 International Conference on Information Technology and Biomedical Engineering (ICITBE), pp. 71–75, IEEE, 2021.
  30. X. Yang, H. Huo, R. Wang, C. Li, X. Liu, and J. Li, “Dglt-fusion: A decoupled global–local infrared and visible image fusion transformer,” Infrared Physics & Technology, vol. 128, p. 104522, 2023.
  31. K. Ma, K. Zeng, and Z. Wang, “Perceptual quality assessment for multi-exposure image fusion,” IEEE Transactions on Image Processing, vol. 24, no. 11, pp. 3345–3356, 2015.
  32. P. Jagalingam and A. V. Hegde, “A review of quality metrics for fused image,” Aquatic Procedia, vol. 4, pp. 133–142, 2015. International Conference On Water Resources, Coastal And Ocean Engineering (ICWRCOE’15).
  33. Y. Bin, Y. Chao, and H. Guoyu, “Efficient image fusion with approximate sparse representation,” International Journal of Wavelets, Multiresolution and Information Processing, vol. 14, no. 04, p. 1650024, 2016.
  34. Q. Zhang, Y. Fu, H. Li, and J. Zou, “Dictionary learning method for joint sparse representation-based image fusion,” Optical Engineering, vol. 52, no. 5, pp. 057006–057006, 2013.
  35. H.-M. Hu, J. Wu, B. Li, Q. Guo, and J. Zheng, “An adaptive fusion algorithm for visible and infrared videos based on entropy and the cumulative distribution of gray levels,” IEEE Transactions on Multimedia, vol. 19, no. 12, pp. 2706–2719, 2017.
  36. K. He, D. Zhou, X. Zhang, R. Nie, Q. Wang, and X. Jin, “Infrared and visible image fusion based on target extraction in the nonsubsampled contourlet transform domain,” Journal of Applied Remote Sensing, vol. 11, no. 1, pp. 015011–015011, 2017.
  37. G. Liu, Z. Lin, S. Yan, J. Sun, Y. Yu, and Y. Ma, “Robust recovery of subspace structures by low-rank representation,” IEEE transactions on pattern analysis and machine intelligence, vol. 35, no. 1, pp. 171–184, 2012.
  38. H. Xu, J. Ma, J. Jiang, X. Guo, and H. Ling, “U2fusion: A unified unsupervised image fusion network,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 1, pp. 502–518, 2022.
  39. Y. Liu, X. Chen, J. Cheng, H. Peng, and Z. Wang, “Infrared and visible image fusion with convolutional neural networks,” International Journal of Wavelets, Multiresolution and Information Processing, vol. 16, no. 03, p. 1850018, 2018.
  40. H. Xu, P. Liang, W. Yu, J. Jiang, and J. Ma, “Learning a generative model for fusing infrared and visible images via conditional generative adversarial network with dual discriminators.,” in IJCAI, pp. 3954–3960, 2019.
  41. Z. Wang, Y. Chen, W. Shao, H. Li, and L. Zhang, “Swinfuse: A residual swin transformer fusion network for infrared and visible images,” IEEE Transactions on Instrumentation and Measurement, vol. 71, pp. 1–12, 2022.
  42. X. Zhang and Y. Demiris, “Visible and infrared image fusion using deep learning,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 8, pp. 10535–10554, 2023.
  43. K. I. Danaci and E. Akagunduz, “A survey on infrared image & video sets,” Multimedia Tools and Applications, vol. 83, pp. 16485–16523, Feb 2024.
  44. H. Li, X.-J. Wu, and J. Kittler, “Rfn-nest: An end-to-end residual fusion network for infrared and visible images,” Information Fusion, vol. 73, pp. 72–86, 2021.
  45. T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, “Microsoft coco: Common objects in context,” in Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13, pp. 740–755, Springer, 2014.
  46. H. Xu, J. Ma, Z. Le, J. Jiang, and X. Guo, “Fusiondn: A unified densely connected network for image fusion,” in Proceedings of the AAAI conference on artificial intelligence, vol. 34, pp. 12484–12491, 2020.
  47. A. Toet et al., “Tno image fusion dataset¡ https://figshare. com/articles,” TN_Image_Fusion_Dataset/1008029, 2014.
  48. J. W. Roberts, J. A. Van Aardt, and F. B. Ahmed, “Assessment of image fusion procedures using entropy, image quality, and multispectral classification,” Journal of Applied Remote Sensing, vol. 2, no. 1, p. 023522, 2008.
  49. V. Aslantas and E. Bendes, “A new image quality metric for image fusion: The sum of the correlations of differences,” Aeu-international Journal of electronics and communications, vol. 69, no. 12, pp. 1890–1896, 2015.
  50. G. Qu, D. Zhang, and P. Yan, “Information measure for performance of image fusion,” Electronics letters, vol. 38, no. 7, p. 1, 2002.
  51. J. Liu, X. Fan, Z. Huang, G. Wu, R. Liu, W. Zhong, and Z. Luo, “Target-aware dual adversarial learning and a multi-scenario multi-modality benchmark to fuse infrared and visible for object detection,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5802–5811, 2022.
Citations (1)

Summary

We haven't generated a summary for this paper yet.