DeepISP: Towards Learning an End-to-End Image Processing Pipeline (1801.06724v2)

Published 20 Jan 2018 in eess.IV and cs.CV

Abstract: We present DeepISP, a full end-to-end deep neural model of the camera image signal processing (ISP) pipeline. Our model learns a mapping from the raw low-light mosaiced image to the final visually compelling image and encompasses low-level tasks such as demosaicing and denoising as well as higher-level tasks such as color correction and image adjustment. The training and evaluation of the pipeline were performed on a dedicated dataset containing pairs of low-light and well-lit images captured by a Samsung S7 smartphone camera in both raw and processed JPEG formats. The proposed solution achieves state-of-the-art performance in objective evaluation of PSNR on the subtask of joint denoising and demosaicing. For the full end-to-end pipeline, it achieves better visual quality compared to the manufacturer ISP, in both a subjective human assessment and when rated by a deep model trained for assessing image quality.

Citations (218)

View on Semantic Scholar

Summary

The paper introduces a unified CNN-based pipeline that jointly performs demosaicing, denoising, and color correction from raw low-light images.
It achieves state-of-the-art performance with improvements of up to 1.28dB PSNR and high MOS ratings compared to traditional ISPs.
The approach offers reduced computational overhead and potential extensions to advanced tasks like motion deblurring and HDR processing.

Overview of DeepISP: Towards Learning an End-to-End Image Processing Pipeline

The paper "DeepISP: Towards Learning an End-to-End Image Processing Pipeline" authored by Eli Schwartz, Raja Giryes, and Alex M. Bronstein, introduces DeepISP, a novel deep learning-based approach for creating an end-to-end image processing pipeline. This pipeline aims to enhance images captured under low-light conditions by learning a direct mapping from raw image data to a processed output of high perceptual quality. Traditional ISPs (Image Signal Processors) typically follow a sequential, modular approach with specific algorithms for tasks such as demosaicing, denoising, and color correction. In contrast, the proposed method leverages convolutional neural networks (CNNs) to perform these tasks simultaneously, sharing information across the pipeline and potentially reducing computational overhead.

Architecture and Methodology

DeepISP consists of two main stages designed to handle both low- and high-level image processing tasks:

Low-level Stage: This stage focuses on tasks like demosaicing and denoising. It includes multiple convolutional blocks that generate residual corrections, which are iteratively added to the image to enhance quality. The network uses small $3 \times 3$ convolutions to maintain efficiency and ensures a continuous flow of information through features shared across layers.
High-level Stage: This part of the network deals with global image enhancements. It modifies the output of the low-level stage using a globally-learned transformation, which is particularly useful for color adjustment and other global corrections. The parameters for this transformation are determined by a fully connected layer that pools information from the preceding convolutional layers.

DeepISP was evaluated on a dedicated dataset consisting of image pairs captured by a Samsung S7 smartphone, with both low-light raw and well-lit processed JPEG formats. The network was trained to output high-quality images from low-light inputs, leveraging a combination of $\ell_1$ norm and MS-SSIM losses to optimize perceptual quality.

Evaluation and Results

For evaluating the DeepISP model, the authors used both objective and subjective metrics:

Objective Metric: For the task of joint denoising and demosaicing, the model was trained and evaluated on the MSR demosaicing dataset. It achieved state-of-the-art performance, outperforming previous methods by 0.72dB and 1.28dB PSNR on Panasonic and Canon test sets respectively.
Subjective Evaluation: The overall effectiveness of DeepISP was tested using human ratings (Mean Opinion Score, MOS) obtained via Amazon Mechanical Turk. Evaluators rated both full images and patches processed by DeepISP, the manufacturer's ISP, and the well-lit ground truth. The network's performance drew high MOS ratings close to those of well-lit images, validated further by predicting quality using a deep learning model trained for this purpose.

Implications and Future Directions

The ability of DeepISP to surpass traditional ISP solutions in terms of both perceived quality and objective enhancement underscores its potential for practical deployments in low-light imaging scenarios. The paper suggests that an ISP fully learned from data can offer image quality improvements, coupled with potential reductions in computational complexity.

Future research could focus on expanding the capabilities of DeepISP to include tasks like motion deblurring and HDR processing, yielding a more robust and universally applicable image processing solution. Additionally, optimizing such models to improve performance of downstream computer vision tasks, like object recognition, could enhance the utility of processed images in broader AI applications.

DeepISP represents a significant advancement in the integration of deep learning techniques into image processing workflows, indicating a promising direction towards more adaptable and comprehensive processing methods in digital photography and beyond.

PDF Markdown

Related Papers

YouTube

Show All Videos