EnlightenGAN: Deep Light Enhancement without Paired Supervision (1906.06972v2)

Published 17 Jun 2019 in cs.CV and eess.IV

Abstract: Deep learning-based methods have achieved remarkable success in image restoration and enhancement, but are they still competitive when there is a lack of paired training data? As one such example, this paper explores the low-light image enhancement problem, where in practice it is extremely challenging to simultaneously take a low-light and a normal-light photo of the same visual scene. We propose a highly effective unsupervised generative adversarial network, dubbed EnlightenGAN, that can be trained without low/normal-light image pairs, yet proves to generalize very well on various real-world test images. Instead of supervising the learning using ground truth data, we propose to regularize the unpaired training using the information extracted from the input itself, and benchmark a series of innovations for the low-light image enhancement problem, including a global-local discriminator structure, a self-regularized perceptual loss fusion, and attention mechanism. Through extensive experiments, our proposed approach outperforms recent methods under a variety of metrics in terms of visual quality and subjective user study. Thanks to the great flexibility brought by unpaired training, EnlightenGAN is demonstrated to be easily adaptable to enhancing real-world images from various domains. The code is available at \url{https://github.com/yueruchen/EnlightenGAN}

View on arXiv

Authors (9)

Yifan Jiang (79 papers)
Xinyu Gong (21 papers)
Ding Liu (52 papers)
Yu Cheng (354 papers)
Chen Fang (157 papers)
Xiaohui Shen (67 papers)
Jianchao Yang (48 papers)
Pan Zhou (220 papers)
Zhangyang Wang (375 papers)

Citations (1,328)

View on Semantic Scholar

Summary

Overview of "EnlightenGAN: Deep Light Enhancement without Paired Supervision"

This paper introduces EnlightenGAN, a powerful unsupervised generative adversarial network (GAN) designed for low-light image enhancement. The significant challenge tackled here is the lack of paired low-light and normal-light images necessary for training supervised deep learning models. EnlightenGAN leverages unpaired training data through GANs, circumventing the need for corresponding image pairs and thus enhancing practical applicability in real-world scenarios.

Methodology

EnlightenGAN hinges on several innovative techniques to address the limitations of prior methods in low-light image enhancement:

Global-Local Discriminator Structure: EnlightenGAN employs a dual-discriminator approach that simultaneously evaluates the global and local quality of enhanced images. This structure balances global illumination enhancement with local adjustments to handle spatially varying light conditions.
Self-Regularized Perceptual Loss: Instead of relying on ground truth images, EnlightenGAN uses a self-regularized perceptual loss. This is computed between the VGG-extracted features of the input low-light image and its enhanced output, aiding in maintaining structural and textural consistency.
Attention Mechanism: The model incorporates a self-regularized attention map derived from the illumination channel of the input image. This guidance helps the generator focus enhancement efforts on darker regions, preventing over-exposure in already well-lit areas.

Architectural Details

The architecture of EnlightenGAN includes:

U-Net Generator: Utilizes multi-scale context information to extract and preserve rich textures, augmented by an attention mechanism to adjust enhancement intensity.
Global Discriminator (D): Employs a relativistic least-square GAN (LSGAN) loss to differentiate between real and generated images.
Local Discriminator (D): Enhances local feature adaptation by evaluating randomly cropped patches, further refining the global enhancement.

The generator's output is refined by imposing both global and local perceptual losses along with adversarial losses, ensuring realistic and high-quality enhancement.

Empirical Evaluation

EnlightenGAN has been rigorously tested against several state-of-the-art methods using both qualitative and quantitative metrics:

Visual Quality: When applied to diverse datasets (e.g., NPE, LIME, MEF, DICM), EnlightenGAN consistently produced images with better illumination balance and fewer artifacts compared to methods like RetinexNet, LLNet, and CycleGAN.
No-Reference Image Quality Assessment: Utilizing the NIQE index, EnlightenGAN outperformed other methods on three out of five benchmark datasets, demonstrating superior perceptual quality.
Human Subjective Evaluation: Through a Bradley-Terry model analysis of pairwise comparisons, EnlightenGAN achieved the highest average ranking in human subjective studies, significantly outperforming LIME and RetinexNet.

Adaptation for Real-World Scenarios

The unpaired training capability of EnlightenGAN was further validated on the BDD-100k dataset, showcasing its adaptability to real-world low-light conditions without requiring paired high-quality counterparts. The domain-adapted version (EnlightenGAN-N) effectively reduced noise and enhanced illumination in practical, noisy low-light images.

Implications and Future Directions

Practical Implications:

EnlightenGAN's unpaired training removes the constraints of paired datasets, permitting broader deployment across various domains such as autonomous driving, surveillance, and consumer photography.
The dual-discriminator and self-regularized mechanisms demonstrate robust performance in diverse lighting scenarios, mitigating artifacts and local inconsistencies often observed in previous methods.

Theoretical Implications:

The integration of a global-local discriminator structure and self-regularized perceptual losses opens new pathways for unpaired image enhancement tasks beyond low-light enhancement.
The innovative use of attention mechanisms guided by inherent image properties (e.g., illumination levels) is a promising direction for enhancing perceptual quality in unsupervised learning frameworks.

Future Directions:

Research might extend EnlightenGAN to other low-light vision tasks, such as video enhancement, where temporal coherency between successive frames could be enforced.
Further developments could explore multi-modal enhancements, integrating additional sensory data (e.g., infrared imaging) to enhance robustness under extreme low-light conditions.

By addressing critical challenges in low-light image enhancement without relying on paired datasets, EnlightenGAN lays the groundwork for future advancements in unsupervised deep learning and real-world image enhancement applications.

PDF Markdown

Related Papers

Find Related Papers

YouTube

Show All Videos