DeepFuse: A Deep Unsupervised Approach for Exposure Fusion with Extreme Exposure Image Pairs (1712.07384v1)

Published 20 Dec 2017 in cs.CV

Abstract: We present a novel deep learning architecture for fusing static multi-exposure images. Current multi-exposure fusion (MEF) approaches use hand-crafted features to fuse input sequence. However, the weak hand-crafted representations are not robust to varying input conditions. Moreover, they perform poorly for extreme exposure image pairs. Thus, it is highly desirable to have a method that is robust to varying input conditions and capable of handling extreme exposure without artifacts. Deep representations have known to be robust to input conditions and have shown phenomenal performance in a supervised setting. However, the stumbling block in using deep learning for MEF was the lack of sufficient training data and an oracle to provide the ground-truth for supervision. To address the above issues, we have gathered a large dataset of multi-exposure image stacks for training and to circumvent the need for ground truth images, we propose an unsupervised deep learning framework for MEF utilizing a no-reference quality metric as loss function. The proposed approach uses a novel CNN architecture trained to learn the fusion operation without reference ground truth image. The model fuses a set of common low level features extracted from each image to generate artifact-free perceptually pleasing results. We perform extensive quantitative and qualitative evaluation and show that the proposed technique outperforms existing state-of-the-art approaches for a variety of natural images.

Authors (3)

V. Sai Srikar (1 paper)
K. Ram Prabhakar (4 papers)
R. Venkatesh Babu (108 papers)

Citations (508)

View on Semantic Scholar

Summary

DeepFuse: A Deep Unsupervised Approach for Exposure Fusion with Extreme Exposure Image Pairs

The paper presents "DeepFuse," an innovative approach to multi-exposure fusion (MEF) leveraging deep learning methodologies. Unlike traditional MEF methods that rely on hand-crafted features, DeepFuse adopts an unsupervised learning framework aimed at fusing static, extreme exposure image pairs effectively and artifact-free.

Technical Details

DeepFuse utilizes a convolutional neural network (CNN) architecture designed to overcome limitations associated with conventional MEF techniques. Traditional methods often falter under varying input conditions, especially when dealing with extreme exposure pairs, because they depend on predetermined feature extraction rules. In contrast, DeepFuse trains its model using an extensive dataset of multi-exposure image stacks, circumventing the need for ground truth supervision by employing a no-reference image quality metric as a loss function.

The architecture features distinct components for feature extraction, a novel fusion layer, and reconstruction, which collectively synthesize the final image without requiring a reference ground truth. This is achieved through a tied weights strategy, ensuring robustness to brightness variations. The fusion process employs similarity-weighted feature maps, enabling CNN-based learning to adaptively manage the brightness and structural integrity of the output.

Empirical Evaluation

Quantitative and qualitative assessments demonstrate that DeepFuse surpasses several state-of-the-art methods. For various test sequences, the approach consistently delivers superior performance, benchmarked using the MEF SSIM, a no-reference structural similarity metric. Notably, DeepFuse exhibits enhanced stability in environments with significant exposure variances, maintaining uniform luminance and avoiding artifacts common in outputs from traditional algorithms.

Implications and Future Directions

DeepFuse contributes a significant advancement to MEF by providing a framework that not only streamlines the computational process but also emphasizes perceptual quality in the absence of explicit supervision. The practical implications are substantial, as this method could be integrated into applications requiring high dynamic range (HDR) imaging, reducing power and storage requirements while improving computational efficiency.

The scope for future development includes expanding the model’s versatility to accommodate dynamically moving objects, thus broadening its application range. Moreover, further exploration into perceptually driven loss functions, inspired by this research, may lead to more sophisticated models capable of addressing other low-level vision tasks.

DeepFuse sets a precedent for utilizing deep learning in MEF, offering a robust alternative that aligns closely with perceptual expectations. It provides a framework for others in the field to build upon, suggesting a shift towards no-reference, deep learning paradigms in image fusion applications.

PDF Markdown

Related Papers

Find Related Papers