DenseFuse: A Fusion Approach to Infrared and Visible Images (1804.08361v9)

Published 23 Apr 2018 in cs.CV

Abstract: In this paper, we present a novel deep learning architecture for infrared and visible images fusion problem. In contrast to conventional convolutional networks, our encoding network is combined by convolutional layers, fusion layer and dense block in which the output of each layer is connected to every other layer. We attempt to use this architecture to get more useful features from source images in encoding process. And two fusion layers(fusion strategies) are designed to fuse these features. Finally, the fused image is reconstructed by decoder. Compared with existing fusion methods, the proposed fusion method achieves state-of-the-art performance in objective and subjective assessment. Code and pre-trained models are available at https://github.com/hli1221/imagefusion_densefuse

Citations (1,088)

View on Semantic Scholar

Summary

The paper introduces DenseFuse, a novel deep learning approach that combines CNNs and dense blocks to effectively fuse infrared and visible images.
The method features an encoder, a fusion layer with addition and L1-norm strategies, and a decoder to reconstruct high-quality fused images.
Experimental results show superior performance with enhanced feature preservation, noise reduction, and structural similarity compared to traditional techniques.

DenseFuse: A Fusion Approach to Infrared and Visible Images

The paper "DenseFuse: A Fusion Approach to Infrared and Visible Images" by Hui Li and Xiao-Jun Wu introduces a novel architecture for the fusion of infrared (IR) and visible (VIS) images using deep learning techniques. This work is a response to the limitations of traditional image fusion methods which often fail to effectively combine the complementary features of IR and VIS images.

The methodology employs a deep learning framework combining convolutional neural networks (CNNs) and dense blocks to enhance feature extraction and preserve salient information throughout the fusion process. The novel architecture involves three main components: an encoder, a fusion layer, and a decoder. The encoder is designed using CNN layers and dense blocks, while two different fusion strategies—addition and $l_1$ -norm—are utilized in the fusion layer to integrate features obtained from the source images. The decoder reconstructs the final fused image from the integrated features.

Key Components of the Proposed Method

Encoder: The encoder network is composed of an initial CNN layer followed by a dense block containing several convolutional layers. Dense connections are employed within the block to ensure that features from all layers are utilized, addressing potential information loss issues seen in traditional deep CNNs.
Fusion Layer: The fusion layer integrates the features extracted by the encoder using either the addition strategy or the $l_1$ -norm and soft-max approach. The addition strategy sums features point-wise, while the $l_1$ -norm strategy weights features based on their $l_1$ -norm values and then applies a soft-max operation for normalization.
Decoder: The final decoder network, consisting of multiple convolutional layers, reconstructs the fused image from the integrated features produced by the fusion layer.

Training and Evaluation

The network is trained using grayscale images from the MS-COCO dataset, where input images are pre-registered. The training process involves minimizing a composite loss function combining pixel loss and structural similarity (SSIM) loss, with different weights assigned to the SSIM component to analyze its impact on training efficiency.

Experimental Results

Performance Metrics: The proposed method is benchmarked against several state-of-the-art techniques, including CBF, JSR, GTF, JSRSD, CNN, and DeepFuse. The assessment employs both subjective criteria and seven objective metrics: entropy (En), Qabf, SCD, FMI_w, FMI_dct, SSIM_a, and MS_SSIM.
Results: DenseFuse demonstrates superior performance, notably achieving the highest average values in five of the seven metrics and second-best in the remaining two. These metrics indicate that the proposed architecture excels in preserving structural information, reducing noise, and producing natural-looking fused images. Visual assessments of test images corroborate these findings, showing reduced artificial noise and better feature preservation compared to existing methods.

Practical and Theoretical Implications

Infrared and Visible Image Fusion: DenseFuse provides a highly effective solution for applications requiring the combination of complementary IR and VIS images, such as surveillance, military, and medical imaging. Its robust feature extraction and integration capabilities ensure high-quality fused images that retain critical information from both modalities.
Future Directions: This architecture's concept can be extended to other image fusion tasks by adjusting the fusion layer to suit different types of input images, such as multi-focus or multi-exposure images. Future work could explore the application of DenseFuse in these contexts and potentially enhance its performance with larger datasets or elaborated training regimes.

Conclusion

DenseFuse, through its dense connectivity and advanced fusion strategies, represents a significant advancement in the field of image fusion. It addresses key limitations in previous methods, offering enhanced feature preservation and reduced noise, leading to visually and quantitatively superior fused images. This success opens new avenues for the application of deep learning in complex image processing tasks, with promising implications for various practical domains.