Infrared and visible Image Fusion with Language-driven Loss in CLIP Embedding Space (2402.16267v1)
Abstract: Infrared-visible image fusion (IVIF) has attracted much attention owing to the highly-complementary properties of the two image modalities. Due to the lack of ground-truth fused images, the fusion output of current deep-learning based methods heavily depends on the loss functions defined mathematically. As it is hard to well mathematically define the fused image without ground truth, the performance of existing fusion methods is limited. In this paper, we first propose to use natural language to express the objective of IVIF, which can avoid the explicit mathematical modeling of fusion output in current losses, and make full use of the advantage of language expression to improve the fusion performance. For this purpose, we present a comprehensive language-expressed fusion objective, and encode relevant texts into the multi-modal embedding space using CLIP. A language-driven fusion model is then constructed in the embedding space, by establishing the relationship among the embedded vectors to represent the fusion objective and input image modalities. Finally, a language-driven loss is derived to make the actual IVIF aligned with the embedded language-driven fusion model via supervised training. Experiments show that our method can obtain much better fusion results than existing techniques.
- Aft: Adaptive fusion transformer for visible and infrared images. IEEE Transactions on Image Processing, 32:2077–2092, 2023.
- Mufusion: A general unsupervised image fusion network based on memory unit. Information Fusion, 92:80–92, 2023.
- Detail preserved fusion of visible and infrared images using regional saliency extraction and multi-scale image decomposition. Optics Communications, 341:199–209, 2015.
- Unsupervised misaligned infrared and visible image fusion via cross-modality image generation and registration. In International Joint Conference on Artificial Intelligence (IJCAI), 2022.
- Infrared-visible image fusion using the undecimated wavelet transform with spectral factorization and target extraction. In 2012 19th IEEE International Conference on Image Processing, pages 2661–2664. IEEE, 2012.
- Stylegan-nada: Clip-guided domain adaptation of image generators. ACM Transactions on Graphics (TOG), 41(4):1–13, 2022.
- Fast saliency-aware multi-modality image fusion. Neurocomputing, 111:70–80, 2013.
- A new image fusion performance metric based on visual information fidelity. Information fusion, 14(2):127–135, 2013.
- Arbitrary style transfer in real-time with adaptive instance normalization. In Proceedings of the IEEE international conference on computer vision, pages 1501–1510, 2017.
- Scope of validity of psnr in image/video quality assessment. Electronics letters, 44(13):800–801, 2008.
- A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4401–4410, 2019.
- Analyzing and improving the image quality of stylegan. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 8110–8119, 2020.
- Diffusionclip: Text-guided image manipulation using diffusion models. 2021.
- Densefuse: A fusion approach to infrared and visible images. IEEE Transactions on Image Processing, 28(5):2614–2623, 2018.
- Nestfuse: An infrared and visible image fusion architecture based on nest connection and spatial/channel attention models. IEEE Transactions on Instrumentation and Measurement, 69(12):9645–9656, 2020.
- Rfn-nest: An end-to-end residual fusion network for infrared and visible images. Information Fusion, 73:72–86, 2021.
- Lrrnet: A novel representation learning guided fusion network for infrared and visible images. IEEE transactions on pattern analysis and machine intelligence, 2023.
- Learning a deep multi-scale feature ensemble and an edge-attention guidance for image fusion. IEEE Transactions on Circuits and Systems for Video Technology, 32(1):105–119, 2021.
- Target-aware dual adversarial learning and a multi-scenario multi-modality benchmark to fuse infrared and visible for object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5802–5811, 2022.
- Rxdnfuse: A aggregated residual dense network for infrared and visible image fusion. Information Fusion, 69:128–141, 2021.
- Perceptual quality assessment for multi-exposure image fusion. IEEE Transactions on Image Processing, 24(11):3345–3356, 2015.
- Fusiongan: A generative adversarial network for infrared and visible image fusion. Information fusion, 48:11–26, 2019.
- Ddcgan: A dual-discriminator conditional generative adversarial network for multi-resolution image fusion. IEEE Transactions on Image Processing, 29:4980–4995, 2020.
- Ganmcc: A generative adversarial network with multi-classification constraints for infrared and visible image fusion. IEEE Transactions on Instrumentation and Measurement, 70:5005014, 2021.
- Swinfusion: Cross-domain long-range learning for general image fusion via swin transformer. IEEE/CAA Journal of Automatica Sinica, 9(7):1200–1217, 2022.
- Styleclip: Text-driven manipulation of stylegan imagery. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 2085–2094, 2021.
- Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
- At-gan: A generative adversarial network with attention and transition for infrared and visible image fusion. Information Fusion, 92:336–349, 2023.
- Assessment of image fusion procedures using entropy, image quality, and multispectral classification. Journal of Applied Remote Sensing, 2(1):023522, 2008.
- Generative modeling by estimating gradients of the data distribution. Advances in neural information processing systems, 32, 2019.
- Piafusion: A progressive infrared and visible image fusion network based on illumination aware. Information Fusion, 83:79–92, 2022.
- Rethinking the necessity of image fusion in high-level vision tasks: A practical infrared and visible image fusion network based on progressive semantic injection and scene fidelity. Information Fusion, page 101870, 2023.
- Alexander Toet. The tno multiband image data collection. Data in brief, 15:249–251, 2017.
- Image fusion using adjustable non-subsampled shearlet transform. IEEE Transactions on Instrumentation and Measurement, 68(9):3367–3378, 2018.
- Fusion method for infrared and visible images by using non-negative sparse representation. Infrared Physics & Technology, 67:477–489, 2014.
- Semantics lead all: Towards unified image registration and fusion from a semantic perspective. Information Fusion, 98:101835, 2023.
- U2fusion: A unified unsupervised image fusion network. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(1):502–518, 2020.
- Visible and infrared image fusion using deep learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023.
- Rethinking the image fusion: A fast unified image fusion network based on proportional maintenance of gradient and intensity. In Proceedings of the AAAI conference on artificial intelligence, volume 34, pages 12797–12804, 2020.
- Metafusion: Infrared and visible image fusion via meta-feature embedding from object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13955–13965, 2023.
- Perceptual fusion of infrared and visible images through a hybrid multi-scale decomposition with gaussian and bilateral filters. Information Fusion, 30:15–26, 2016.
- A perceptual framework for infrared–visible image fusion based on multiscale structure decomposition and biological vision. Information Fusion, 93:174–191, 2023.
- Clf-net: Contrastive learning for infrared and visible image fusion network. IEEE Transactions on Instrumentation and Measurement, 71:1–15, 2022.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Collections
Sign up for free to add this paper to one or more collections.