Papers
Topics
Authors
Recent
Search
2000 character limit reached

Ghost-free High Dynamic Range Imaging with Context-aware Transformer

Published 10 Aug 2022 in cs.CV | (2208.05114v1)

Abstract: High dynamic range (HDR) deghosting algorithms aim to generate ghost-free HDR images with realistic details. Restricted by the locality of the receptive field, existing CNN-based methods are typically prone to producing ghosting artifacts and intensity distortions in the presence of large motion and severe saturation. In this paper, we propose a novel Context-Aware Vision Transformer (CA-ViT) for ghost-free high dynamic range imaging. The CA-ViT is designed as a dual-branch architecture, which can jointly capture both global and local dependencies. Specifically, the global branch employs a window-based Transformer encoder to model long-range object movements and intensity variations to solve ghosting. For the local branch, we design a local context extractor (LCE) to capture short-range image features and use the channel attention mechanism to select informative local details across the extracted features to complement the global branch. By incorporating the CA-ViT as basic components, we further build the HDR-Transformer, a hierarchical network to reconstruct high-quality ghost-free HDR images. Extensive experiments on three benchmark datasets show that our approach outperforms state-of-the-art methods qualitatively and quantitatively with considerably reduced computational budgets. Codes are available at https://github.com/megvii-research/HDR-Transformer

Citations (60)

Summary

  • The paper introduces a novel CA-ViT architecture that reconciles global and local image features to effectively remove ghosting artifacts in HDR imaging.
  • The methodology integrates Vision Transformers with CNN-based local context extractors, achieving significant improvements in PSNR and HDR-VDP-2 metrics.
  • The HDR-Transformer provides a computationally efficient solution that outperforms state-of-the-art models on benchmark datasets, advancing HDR imaging in dynamic conditions.

Ghost-free High Dynamic Range Imaging with Context-aware Transformer

The paper presents a significant advancement in the domain of high dynamic range (HDR) imaging, particularly focusing on HDR deghosting. The authors propose a novel Context-Aware Vision Transformer (CA-ViT) which caters to the challenges faced in HDR imaging caused by ghosting artifacts and intensity distortions due to large motions and severe saturation. The CA-ViT is designed as a dual-branch architecture adept at capturing both global and local dependencies within image frames. This innovative approach leverages the strengths of Vision Transformers (ViTs) and convolutional neural networks (CNNs) to achieve improved HDR imaging outcomes.

The CA-ViT framework employs a global branch encapsulated by a window-based Transformer encoder, which facilitates the modeling of long-range dependencies and intensity variations effectively. Concurrently, the local branch integrates a local context extractor (LCE) aligned with a channel attention mechanism, enabling the extraction and selection of crucial local image features. This integration aids in complementing the global insights, ensuring comprehensive context awareness and boosting the HDR deghosting process.

Building upon CA-ViT, the paper introduces the HDR-Transformer, a hierarchically structured network engineered to reconstruct high-quality, ghost-free HDR images efficiently. This framework is validated through exhaustive experimentation across three benchmark datasets, where it demonstrates superior qualitative and quantitative performance against state-of-the-art methodologies. Importantly, the proposed approach achieves these results with substantially reduced computational demands, offering a practical solution for real-world HDR imaging challenges.

Key findings from experimental data underscore the effectiveness of HDR-Transformer, with marked improvements over existing models on metrics such as PSNR and SSIM in both linear and tonemapped domains. Notably, on Kalantari et al.'s test set, the HDR-Transformer exhibits a PSNR-μ\mu of 44.32 and an HDR-VDP-2 score of 66.03, surpassing existing benchmarks. The results from qualitative assessments highlight the HDR-Transformer's capability to manage ghosting artifacts efficiently under conditions of extreme motion and saturation, outperforming conventional approaches like CNN-based methods which struggle under such scenarios.

The implications of this research are profound, with the potential to substantially enhance the fidelity of HDR imaging in diverse dynamic environments, ranging from consumer photography to professional videography. The facilitation of both global and local context understanding via the CA-ViT architecture paves the way for future developments in HDR imaging, possibly extending into other areas of computer vision that require similar dual-branch processing capabilities.

In summary, this paper contributes a robust, efficient, and adaptable HDR deghosting framework, highlighting the promising capabilities of Transformers in vision applications previously dominated by CNNs. The architectural insights and empirical validations presented not only advance the current state of HDR imaging but also set a new trajectory for future research in enhancing image quality under dynamic conditions using hybrid neural architectures.

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.