DePF: A Novel Fusion Approach based on Decomposition Pooling for Infrared and Visible Images (2305.17376v2)
Abstract: Infrared and visible image fusion aims to generate synthetic images simultaneously containing salient features and rich texture details, which can be used to boost downstream tasks. However, existing fusion methods are suffering from the issues of texture loss and edge information deficiency, which result in suboptimal fusion results. Meanwhile, the straight-forward up-sampling operator can not well preserve the source information from multi-scale features. To address these issues, a novel fusion network based on the decomposition pooling (de-pooling) manner is proposed, termed as DePF. Specifically, a de-pooling based encoder is designed to extract multi-scale image and detail features of source images at the same time. In addition, the spatial attention model is used to aggregate these salient features. After that, the fused features will be reconstructed by the decoder, in which the up-sampling operator is replaced by the de-pooling reversed operation. Different from the common max-pooling technique, image features after the de-pooling layer can retain abundant details information, which is benefit to the fusion process. In this case, rich texture information and multi-scale information are maintained during the reconstruction phase. The experimental results demonstrate that the proposed method exhibits superior fusion performance over the state-of-the-arts on multiple image fusion benchmarks.
- S. Li, R. Dian, L. Fang, and J. M. Bioucas-Dias, “Fusing hyperspectral and multispectral images via coupled sparse tensor factorization,” IEEE Transactions on Image Processing, vol. 27, no. 8, pp. 4118–4130, 2018.
- H. Gao, B. Cheng, J. Wang, K. Li, J. Zhao, and D. Li, “Object classification using cnn-based fusion of vision and lidar in autonomous vehicle environment,” IEEE Transactions on Industrial Informatics, vol. 14, no. 9, pp. 4224–4231, 2018.
- R. Guo, D. Li, and Y. Han, “Deep multi-scale and multi-modal fusion for 3d object detection,” Pattern Recognition Letters, vol. 151, pp. 236–242, 2021.
- W. Gao, G. Liao, S. Ma, G. Li, Y. Liang, and W. Lin, “Unified information fusion network for multi-modal rgb-d and rgb-t salient object detection,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 32, no. 4, pp. 2091–2106, 2021.
- L. Zhang, M. Danelljan, A. Gonzalez-Garcia, J. Van De Weijer, and F. Shahbaz Khan, “Multi-modal fusion for end-to-end rgb-t tracking,” in Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, 2019, pp. 0–0.
- J. Zhu, S. Lai, X. Chen, D. Wang, and H. Lu, “Visual prompt multi-modal tracking,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 9516–9526.
- F. Nencini, A. Garzelli, S. Baronti, and L. Alparone, “Remote sensing image fusion using the curvelet transform,” Information fusion, vol. 8, no. 2, pp. 143–156, 2007.
- J. J. Lewis, R. J. O’Callaghan, S. G. Nikolov, D. R. Bull, and N. Canagarajah, “Pixel-and region-based image fusion with complex wavelets,” Information fusion, vol. 8, no. 2, pp. 119–130, 2007.
- Y. C. Pati, R. Rezaiifar, and P. S. Krishnaprasad, “Orthogonal matching pursuit: Recursive function approximation with applications to wavelet decomposition,” in Proceedings of 27th Asilomar conference on signals, systems and computers. IEEE, 1993, pp. 40–44.
- S. Li, H. Yin, and L. Fang, “Group-sparse representation with dictionary learning for medical image denoising and fusion,” IEEE Transactions on biomedical engineering, vol. 59, no. 12, pp. 3450–3459, 2012.
- C. Chen, Y. Li, W. Liu, and J. Huang, “Image fusion with local spectral consistency and dynamic gradient sparsity,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2014, pp. 2760–2765.
- B. Yang and S. Li, “Pixel-level image fusion with simultaneous orthogonal matching pursuit,” Information fusion, vol. 13, no. 1, pp. 10–19, 2012.
- H. Li, X.-J. Wu, and J. Kittler, “Infrared and visible image fusion using a deep learning framework,” in 2018 24th international conference on pattern recognition (ICPR). IEEE, 2018, pp. 2705–2710.
- C. Gao, C. Song, Y. Zhang, D. Qi, and Y. Yu, “Improving the performance of infrared and visible image fusion based on latent low-rank representation nested with rolling guided image filtering,” IEEE Access, vol. 9, pp. 91 462–91 475, 2021.
- D. Bhavana, K. Kishore Kumar, and D. Ravi Tej, “Infrared and visible image fusion using latent low rank technique for surveillance applications,” International Journal of Speech Technology, pp. 1–10, 2021.
- H. Li and X.-J. Wu, “Densefuse: A fusion approach to infrared and visible images,” IEEE Transactions on Image Processing, vol. 28, no. 5, pp. 2614–2623, 2018.
- H. Li, X.-J. Wu, and T. Durrani, “Nestfuse: An infrared and visible image fusion architecture based on nest connection and spatial/channel attention models,” IEEE Transactions on Instrumentation and Measurement, vol. 69, no. 12, pp. 9645–9656, 2020.
- J. Ma, W. Yu, P. Liang, C. Li, and J. Jiang, “Fusiongan: A generative adversarial network for infrared and visible image fusion,” Information fusion, vol. 48, pp. 11–26, 2019.
- H. Li, T. Xu, X.-J. Wu, J. Lu, and J. Kittler, “LRRNet: A Novel Representation Learning Guided Fusion Network for Infrared and Visible Images,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023.
- H. Li, X.-J. Wu, and J. Kittler, “Rfn-nest: An end-to-end residual fusion network for infrared and visible images,” Information Fusion, vol. 73, pp. 72–86, 2021.
- B. Li, J. Lu, Z. Liu, Z. Shao, C. Li, Y. Du, and J. Huang, “Aefusion: A multi-scale fusion network combining axial attention and entropy feature aggregation for infrared and visible images,” Applied Soft Computing, vol. 132, p. 109857, 2023.
- I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” in Neural Information Processing Systems, 2014, p. 2672–2680.
- L. Tang, J. Yuan, and J. Ma, “Image fusion in the loop of high-level vision tasks: A semantic-aware real-time infrared and visible image fusion network,” Information Fusion, vol. 82, pp. 28–42, 2022.
- L. Tang, X. Xiang, H. Zhang, M. Gong, and J. Ma, “Divfusion: Darkness-free infrared and visible image fusion,” Information Fusion, vol. 91, pp. 477–493, 2023.
- C. Cheng, T. Xu, and X.-J. Wu, “Mufusion: A general unsupervised image fusion network based on memory unit,” Information Fusion, vol. 92, pp. 80–92, 2023.
- E. Oyallon, E. Belilovsky, and S. Zagoruyko, “Scaling the scattering transform: Deep hybrid networks,” in Proceedings of the IEEE international conference on computer vision, 2017, pp. 5618–5627.
- S. Fujieda, K. Takayama, and T. Hachisuka, “Wavelet convolutional neural networks,” arXiv preprint arXiv:1805.08620, 2018.
- T. Williams and R. Li, “Wavelet pooling for convolutional neural networks,” in International conference on learning representations, 2018.
- B.-S. Wang, J.-W. Hsieh, P.-Y. Chen, M.-C. Chang, L. Ke, and S. Lyu, “Ldw-pooling: Learnable discrete wavelet pooling for convolutional networks,” 2021.
- G. Pajares and J. M. De La Cruz, “A wavelet-based image fusion tutorial,” Pattern recognition, vol. 37, no. 9, pp. 1855–1872, 2004.
- M. N. Do and M. Vetterli, “Contourlets: a directional multiresolution image representation,” in Proceedings. International Conference on Image Processing, vol. 1. IEEE, 2002, pp. I–357–I–360.
- G. Easley, D. Labate, and W.-Q. Lim, “Sparse directional image representations using the discrete shearlet transform,” Applied and Computational Harmonic Analysis, vol. 25, no. 1, pp. 25–46, 2008.
- Y. Liu and Z. Wang, “Simultaneous image fusion and denoising with adaptive sparse representation,” IET Image Processing, vol. 9, no. 5, pp. 347–357, 2015.
- B. Yang and S. Li, “Visual attention guided image fusion with sparse representation,” Optik, vol. 125, no. 17, pp. 4881–4888, 2014.
- J. Ma, Z. Zhou, B. Wang, and H. Zong, “Infrared and visible image fusion based on visual saliency map and weighted least square optimization,” Infrared Physics & Technology, vol. 82, pp. 8–17, 2017.
- K. Ram Prabhakar, V. Sai Srikar, and R. Venkatesh Babu, “Deepfuse: A deep unsupervised approach for exposure fusion with extreme exposure image pairs,” in Proceedings of the IEEE international conference on computer vision, 2017, pp. 4714–4722.
- L. Qu, S. Liu, M. Wang, and Z. Song, “Transmef: A transformer-based multi-exposure image fusion framework using self-supervised multi-task learning,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, no. 2, 2022, pp. 2126–2134.
- L. Jian, X. Yang, Z. Liu, G. Jeon, M. Gao, and D. Chisholm, “Sedrfuse: A symmetric encoder–decoder with residual block network for infrared and visible image fusion,” IEEE Transactions on Instrumentation and Measurement, vol. 70, pp. 1–15, 2020.
- Z. Zhao, S. Xu, J. Zhang, C. Liang, C. Zhang, and J. Liu, “Efficient and model-based infrared and visible image fusion via algorithm unrolling,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 32, no. 3, pp. 1186–1196, 2021.
- Y. Zhang, Y. Liu, P. Sun, H. Yan, X. Zhao, and L. Zhang, “Ifcnn: A general image fusion framework based on convolutional neural network,” Information Fusion, vol. 54, pp. 99–118, 2020.
- M. Xu, L. Tang, H. Zhang, and J. Ma, “Infrared and visible image fusion via parallel scene and texture learning,” Pattern Recognition, vol. 132, p. 108929, 2022.
- L. Tang, J. Yuan, H. Zhang, X. Jiang, and J. Ma, “Piafusion: A progressive infrared and visible image fusion network based on illumination aware,” Information Fusion, vol. 83, pp. 79–92, 2022.
- J. Ma, L. Tang, F. Fan, J. Huang, X. Mei, and Y. Ma, “Swinfusion: Cross-domain long-range learning for general image fusion via swin transformer,” IEEE/CAA Journal of Automatica Sinica, vol. 9, no. 7, pp. 1200–1217, 2022.
- Z. Wang, Y. Chen, W. Shao, H. Li, and L. Zhang, “Swinfuse: A residual swin transformer fusion network for infrared and visible images,” IEEE Transactions on Instrumentation and Measurement, vol. 71, pp. 1–12, 2022.
- W. Tang, F. He, Y. Liu, Y. Duan, and T. Si, “Datfuse: Infrared and visible image fusion via dual attention transformer,” IEEE Transactions on Circuits and Systems for Video Technology, 2023.
- T. Chen and J. Yang, “Very deep fully convolutional encoder–decoder network based on wavelet transform for art image fusion in cloud computing environment,” Evolving Systems, vol. 14, no. 2, pp. 281–293, 2023.
- Z. Wang, X. Li, H. Duan, X. Zhang, and H. Wang, “Multifocus image fusion using convolutional neural networks in the discrete wavelet transform domain,” Multimedia Tools and Applications, vol. 78, pp. 34 483–34 512, 2019.
- R. Xu, G. Liu, Y. Xie, B. D. Prasad, Y. Qian, and M. Xing, “Multiscale feature pyramid network based on activity level weight selection for infrared and visible image fusion,” JOSA A, vol. 39, no. 12, pp. 2193–2204, 2022.
- Z. Chao, X. Duan, S. Jia, X. Guo, H. Liu, and F. Jia, “Medical image fusion via discrete stationary wavelet transform and an enhanced radial basis function neural network,” Applied Soft Computing, vol. 118, p. 108542, 2022.
- Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: from error visibility to structural similarity,” IEEE transactions on image processing, vol. 13, no. 4, pp. 600–612, 2004.
- T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, “Microsoft coco: Common objects in context,” in Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13. Springer, 2014, pp. 740–755.
- X. Jia, C. Zhu, M. Li, W. Tang, and W. Zhou, “Llvip: A visible-infrared paired dataset for low-light vision,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 3496–3504.
- A. Toet, “The tno multiband image data collection,” Data in brief, vol. 15, pp. 249–251, 2017.
- H. Xu, J. Ma, J. Jiang, X. Guo, and H. Ling, “U2fusion: A unified unsupervised image fusion network,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 1, pp. 502–518, 2020.
- H. Xu, M. Gong, X. Tian, J. Huang, and J. Ma, “Cufd: An encoder–decoder network for visible and infrared image fusion based on common and unique feature decomposition,” Computer Vision and Image Understanding, vol. 218, p. 103407, 2022.
- J. Yoo, Y. Uh, S. Chun, B. Kang, and J.-W. Ha, “Photorealistic style transfer via wavelet transforms,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 9036–9045.