DSLR-Quality Photos on Mobile Devices with Deep Convolutional Networks (1704.02470v2)

Published 8 Apr 2017 in cs.CV

Abstract: Despite a rapid rise in the quality of built-in smartphone cameras, their physical limitations - small sensor size, compact lenses and the lack of specific hardware, - impede them to achieve the quality results of DSLR cameras. In this work we present an end-to-end deep learning approach that bridges this gap by translating ordinary photos into DSLR-quality images. We propose learning the translation function using a residual convolutional neural network that improves both color rendition and image sharpness. Since the standard mean squared loss is not well suited for measuring perceptual image quality, we introduce a composite perceptual error function that combines content, color and texture losses. The first two losses are defined analytically, while the texture loss is learned in an adversarial fashion. We also present DPED, a large-scale dataset that consists of real photos captured from three different phones and one high-end reflex camera. Our quantitative and qualitative assessments reveal that the enhanced image quality is comparable to that of DSLR-taken photos, while the methodology is generalized to any type of digital camera.

Citations (490)

View on Semantic Scholar

Summary

The paper introduces a novel residual CNN that maps mobile photos to DSLR-like images by enhancing color, texture, and sharpness.
It employs a composite loss function combining VGG-19 based content loss, Gaussian-blur color loss, and adversarial texture loss to achieve perceptual quality.
Experimental evaluations using PSNR, SSIM, and user studies confirm that the enhanced images are often indistinguishable from true DSLR photos.

DSLR-Quality Photos on Mobile Devices with Deep Convolutional Networks

The paper "DSLR-Quality Photos on Mobile Devices with Deep Convolutional Networks" by Andrey Ignatov et al. addresses the challenge of enhancing the image quality of photos taken by mobile device cameras to match that of DSLR cameras. This challenge arises due to inherent physical limitations of mobile cameras such as small sensor sizes and compact lenses.

Key Contributions

The primary contribution of this research is a novel end-to-end learning framework based on deep convolutional networks designed to elevate the quality of smartphone camera photos to that of DSLR-quality images. The authors present several major innovations:

Residual Convolutional Neural Network (CNN): A network architecture is designed to learn a translation function that enhances both color rendition and image sharpness. This is accomplished using residual connections to effectively map mobile photos to DSLR-quality images.
Advanced Loss Function: The standard mean squared error (MSE) is inadequate for capturing perceptual quality. Consequently, a composite perceptual error function is introduced that includes content, color, and texture losses:
- Content Loss: Based on VGG-19 network activations, preserving the semantic content between original and enhanced images.
- Color Loss: Uses a Gaussian blur to compute similarity in color distribution, addressing issues of alignment.
- Texture Loss: Employs adversarial learning to ensure realistic texture details, leveraging a discriminator to guide the enhancement process.
Dataset Creation (DPED): A comprehensive dataset called the DSLR Photo Enhancement Dataset (DPED) is developed. It contains photos captured by both mobile and DSLR cameras, facilitating supervised learning. This dataset aids in training networks to generalize image enhancement tasks across various camera types.

Experimental Evaluation

Quantitative evaluations using metrics such as PSNR and SSIM showed that the enhanced images achieved by the proposed method exhibited quality on par with DSLR photos. Subjective user studies further corroborated these findings, indicating that participants often could not distinguish between the enhanced images and actual DSLR photos.

Implications and Future Directions

This work has significant practical implications:

Consumer Photography: Enables users to achieve high-quality photos without requiring expensive equipment.
Mobile Applications: Potential to integrate this technology into smartphone apps, providing real-time photo enhancement.

Theoretically, this research contributes to the field of image translation, offering insights into loss function designs that are perceptually informed and robust to pixel misalignment.

Future research could explore:

Weak Supervision Techniques: Reducing dependency on paired datasets for training.
Generalization Across Devices: Extending the method to automatically adjust for various device-specific characteristics.
Real-Time Enhancement: Optimizing network architectures for deployment on-device, considering computational constraints.

In sum, the paper leverages deep convolutional networks to bridge the quality gap between mobile and DSLR cameras. By considering perceptual quality holistically through innovative loss functions, it sets a foundation for future advancements in image enhancement technologies.

PDF Markdown