Contextual colorization and denoising for low-light ultra high resolution sequences (2101.01597v1)

Published 5 Jan 2021 in eess.IV and cs.CV

Abstract: Low-light image sequences generally suffer from spatio-temporal incoherent noise, flicker and blurring of moving objects. These artefacts significantly reduce visual quality and, in most cases, post-processing is needed in order to generate acceptable quality. Most state-of-the-art enhancement methods based on machine learning require ground truth data but this is not usually available for naturally captured low light sequences. We tackle these problems with an unpaired-learning method that offers simultaneous colorization and denoising. Our approach is an adaptation of the CycleGAN structure. To overcome the excessive memory limitations associated with ultra high resolution content, we propose a multiscale patch-based framework, capturing both local and contextual features. Additionally, an adaptive temporal smoothing technique is employed to remove flickering artefacts. Experimental results show that our method outperforms existing approaches in terms of subjective quality and that it is robust to variations in brightness levels and noise.

Citations (15)

View on Semantic Scholar

Summary

The paper presents an unpaired CycleGAN approach employing a novel multiscale patch-based method to enhance low-light UHR sequence quality.
It integrates adaptive temporal smoothing with weighted loss functions to minimize flickering and preserve texture detail.
Experimental results show significant quality improvements, underscoring potential for applications in professional filmmaking and security surveillance.

Unpaired-Learning Approach for Contextual Colorization and Denoising in Low-Light Ultra High Resolution Sequences: A Summary

The paper presents an innovative unpaired-learning framework aimed at enhancing ultra high resolution (UHR) sequences captured in low-light conditions. These sequences are frequently plagued by noise, flickering, and motion blurring, which complicate visual quality and subsequent automated tasks like object detection and classification. Traditional enhancement methods often fall short, necessitating manual and time-intensive post-processing. With the advent of deep learning networks, particularly CycleGANs, the potential for automation in this area is promising, albeit challenging due to hardware constraints and the absence of paired training data.

In addressing these challenges, the authors adapt the CycleGAN framework specifically for UHR sequences. A major point of the paper is the formulation of a novel multiscale patch-based methodology that surmounts memory limitations by leveraging local and contextual features. This patch-based approach allows the model to capture local texture and global context, essential for reproducing realistic enhancements without excessive computational demands.

The proposed model architecture comprises the following key components:

Patch Generation: The data is divided into local and region patches, recognizing both fine-grained textures and broader contextual information. These patches are crucial for extracting meaningful patterns that contribute to accurate translations between low-light inputs and target outputs.
CycleGAN Structure: The network uses an advanced configuration of the CycleGAN model, enhancing generators and discriminators to be more memory efficient. This includes employing a relativistic average GAN for stabilizing training and improving the realism of generated images.
Loss Functions: A combination of adversarial, cycle consistency, and identity losses are carefully weighted to optimize both local details and overall perceptual quality. The design of these loss functions ensures that the enhanced images maintain the integrity of textures while providing a visually appealing output.
Adaptive Temporal Smoothing: Recognizing that videos have inherent temporal dependencies, the paper introduces an adaptive smoothing technique. This process reduces flickering by adjusting the temporal sliding window based on motion analysis, thereby preserving temporal coherency across frames.

In experimental evaluations, the proposed framework demonstrates significant improvements over existing methodologies in enhancing low-light UHR sequences. The robustness of the model is evidenced by its performance across varying levels of input brightness, maintaining high subjective quality without increased noise artifacts.

The paper's findings have substantial implications in fields requiring high fidelity visual outputs from high-resolution, low-light data, such as in professional filmmaking and security surveillance. The proposed methods could be extended with further developments in real-time processing capabilities, potentially integrating more comprehensive motion estimation techniques or multi-frame GAN architectures.

This research exemplifies the possibility of leveraging unpaired image translation for real-world applications where obtaining paired datasets is impractical. Future advancements could explore adaptive integration with hardware constraints and more sophisticated neural architectures, enhancing both efficiency and output quality of low-light sequences.

Related Papers

YouTube

Show All Videos