Single-Image HDR Reconstruction by Learning to Reverse the Camera Pipeline (2004.01179v1)

Published 2 Apr 2020 in eess.IV and cs.CV

Abstract: Recovering a high dynamic range (HDR) image from a single low dynamic range (LDR) input image is challenging due to missing details in under-/over-exposed regions caused by quantization and saturation of camera sensors. In contrast to existing learning-based methods, our core idea is to incorporate the domain knowledge of the LDR image formation pipeline into our model. We model the HDRto-LDR image formation pipeline as the (1) dynamic range clipping, (2) non-linear mapping from a camera response function, and (3) quantization. We then propose to learn three specialized CNNs to reverse these steps. By decomposing the problem into specific sub-tasks, we impose effective physical constraints to facilitate the training of individual sub-networks. Finally, we jointly fine-tune the entire model end-to-end to reduce error accumulation. With extensive quantitative and qualitative experiments on diverse image datasets, we demonstrate that the proposed method performs favorably against state-of-the-art single-image HDR reconstruction algorithms.

Citations (220)

View on Semantic Scholar

Summary

The paper's main contribution is modeling the inverse camera pipeline by decomposing dequantization, linearization, and hallucination into three CNNs.
It demonstrates superior performance over state-of-the-art methods through quantitative metrics like HDR-VDP-2 and supportive qualitative user studies.
The approach offers practical advantages for reconstructing HDR images from single exposures, benefiting archival photography and online imagery.

Single-Image HDR Reconstruction by Learning to Reverse the Camera Pipeline: An Expert Analysis

The research presented in the paper addresses the challenge of High Dynamic Range (HDR) image reconstruction from a single Low Dynamic Range (LDR) input image. With a focus on reversing the intrinsic constraints of the camera pipeline, this methodology provides an advancement over previous deep learning approaches by integrating domain knowledge in the model design.

Core Contributions

This paper's primary contribution is its novel approach to model the inverse process of the LDR image formation pipeline. The authors decompose the problem into three sub-tasks: dequantization, linearization, and hallucination, which correspond to reversing the quantization, non-linear response function, and dynamic range clipping inherent in camera image formation. They employ three specialized Convolutional Neural Networks (CNNs) for these tasks, consequently training them with appropriate physical constraints and loss functions.

Dequantization-Net: This network focuses on eliminating quantization artifacts commonly found in LDR images, such as banding in smooth regions.
Linearization-Net: Essential for estimating the Camera Response Function (CRF), this network incorporates edge and histogram features to convert an image back to a linear representation. Uniquely, it leverages the EMoR model for empirical CRF representation.
Hallucination-Net: Designed to address missing information in over-exposed regions, the Hallucination-Net reconstructs these details using principles of image completion while maintaining constraints inherent to image formation.

The paper also proposes an end-to-end joint fine-tuning process that consolidates these tasks, reducing cumulative error and improving generalization across diverse datasets.

Key Insights and Methodological Rigor

The paper rigorously evaluates the proposed model's performance across several datasets, notably HDR-Synth, HDR-Real, RAISE, and HDR-Eye. Quantitative metrics such as HDR-VDP-2 scores underscore the model's superior performance compared to state-of-the-art methods like HDRCNN, DrTMO, and ExpandNet. The paper also presents qualitative visual results, highlighting the method's ability to recover fine details in HDR reconstructions with minimal artifacts or noise.

The authors extend their paper by conducting an evaluative user paper, reinforcing the model's perceptual advantages. Additionally, an ablation analysis within the paper validates the contribution of each sub-task network, showcasing the structured decomposition's impact on improving results over direct LDR-to-HDR mappings.

Implications and Future Directions

The method's potential implications are significant, especially for situations where dynamic range enhancement is desirable from sources with only a single exposure, such as archival photographs or content found online. From a theoretical standpoint, the work establishes a paradigm for solving inverse imaging problems by integrating domain-specific knowledge into deep learning models.

The authors suggest that future work could explore the refinement and extension of this pipeline collaboration, potentially incorporating a wider array of spatial operations typical of camera pipelines. The method is also poised to benefit from advances in neural architecture searching and the integration of additional perceptual cues.

Conclusion

This paper constitutes a valuable addition to the field of HDR imaging by showcasing the efficacy of reversing the camera pipeline. By embedding domain knowledge into neural networks, the authors present a robust model that significantly enhances single-image HDR reconstruction capabilities. It is a noteworthy endeavor that aligns methodical rigor with innovative application of computational models, providing a promising foundation for continued research and practical application in image processing technologies.

PDF Markdown