WESPE: Weakly Supervised Photo Enhancer for Digital Cameras (1709.01118v2)

Published 4 Sep 2017 in cs.CV

Abstract: Low-end and compact mobile cameras demonstrate limited photo quality mainly due to space, hardware and budget constraints. In this work, we propose a deep learning solution that translates photos taken by cameras with limited capabilities into DSLR-quality photos automatically. We tackle this problem by introducing a weakly supervised photo enhancer (WESPE) - a novel image-to-image Generative Adversarial Network-based architecture. The proposed model is trained by under weak supervision: unlike previous works, there is no need for strong supervision in the form of a large annotated dataset of aligned original/enhanced photo pairs. The sole requirement is two distinct datasets: one from the source camera, and one composed of arbitrary high-quality images that can be generally crawled from the Internet - the visual content they exhibit may be unrelated. Hence, our solution is repeatable for any camera: collecting the data and training can be achieved in a couple of hours. In this work, we emphasize on extensive evaluation of obtained results. Besides standard objective metrics and subjective user study, we train a virtual rater in the form of a separate CNN that mimics human raters on Flickr data and use this network to get reference scores for both original and enhanced photos. Our experiments on the DPED, KITTI and Cityscapes datasets as well as pictures from several generations of smartphones demonstrate that WESPE produces comparable or improved qualitative results with state-of-the-art strongly supervised methods.

Citations (167)

View on Semantic Scholar

Summary

The paper introduces WESPE, a weakly supervised GAN-based framework for enhancing low-quality digital camera images to DSLR quality without requiring paired training data.
Experimental evaluation using objective metrics, subjective studies, and a virtual rater demonstrates WESPE's superior performance compared to traditional software and competitive results against supervised methods.
WESPE offers a flexible and scalable solution for improving image quality in mobile devices and other applications, challenging the necessity of strong supervision in photo enhancement tasks.

An Analysis of "WESPE: Weakly Supervised Photo Enhancer for Digital Cameras"

The paper "WESPE: Weakly Supervised Photo Enhancer for Digital Cameras" authored by Andrey Ignatov et al., presents an innovative approach to enhancing photo quality captured by low-end digital cameras. The primary focus is the development of a deep learning-based solution that transforms low-quality images into DSLR-quality outputs without requiring extensive data annotations or paired datasets for direct supervision.

Methodological Overview

The researchers propose a novel framework, WESPE, which stands for Weakly Supervised Photo Enhancer. It leverages the Generative Adversarial Network (GAN) architecture, particularly tailored for image-to-image translation tasks. Primarily, this model is trained on two unpaired datasets: one containing low-quality images from the source camera and another comprising high-quality images representing the target domain. The innovation here is the relaxation of the need for aligned image pairs, enabling applicability across a wide array of source cameras and scenarios.

The proposed WESPE architecture incorporates:

GAN-based Generative Mapping: It employs generative mappings between domains without needing paired data, thus relying on a neural framework utilizing adversarial training.
Content and Texture Losses: It integrates specific losses that ensure content and texture consistency between original and enhanced images, based on VGG-19 feature maps.
Adversarial Discriminators: Two discriminators are used: one focusing on color differences and the other on texture discrepancies, each enhancing different quality aspects of images to mimic the high-quality image domain.

Experimental Evaluation

A comprehensive set of experiments evaluates the efficacy of WESPE using datasets such as DPED, KITTI, and Cityscapes, as well as newly acquired images from smartphones like HTC One M9, Huawei P9, and iPhone 6. The results are compelling, demonstrating that WESPE can produce enhanced images with quality surpassing traditional enhancement software and competing favorably with supervised learning models that require aligned datasets.

Objective Metrics: The model's performance was robust across metrics like PSNR and SSIM, highlighting its ability to retain structural and perceptual fidelity in enhanced images.
Subjective Assessment: User studies reinforced the quantitative findings, showing a preference for WESPE-enhanced images over original and software-enhanced photos.
Quality Assessment via Flickr Faves Score (FFS): A CNN modeled on Flickr user data served as a scalable virtual rater, further validating WESPE's capability to improve perceived image quality effectively.

Implications and Future Directions

The implications of WESPE's development are significant in both practical and theoretical domains. Practically, it is a flexible framework that mobile device manufacturers and software developers can adapt to improve image quality without the upfront cost of acquiring paired datasets. Theoretically, it challenges the prevailing assumptions about the necessity of strong supervision in enhancing photo quality, suggesting that weak supervision could become a more prevalent paradigm in other image processing tasks.

Future work can explore expanding the domain of WESPE to video processing or integrating it with real-time image enhancement applications. Additionally, further refinement in understanding the balance of loss components could yield even more significant improvements in image quality across diverse photo sets.

In conclusion, the paper underscores a significant advancement in photo enhancement methodologies, providing a scalable, efficient alternative to traditional, heavily supervised models. Given its ease and speed of deployment across various camera types, WESPE represents a noteworthy evolution in the domain of computational photography and image enhancement.