Deep Multi-scale Convolutional Neural Network for Dynamic Scene Deblurring (1612.02177v2)

Published 7 Dec 2016 in cs.CV

Abstract: Non-uniform blind deblurring for general dynamic scenes is a challenging computer vision problem as blurs arise not only from multiple object motions but also from camera shake, scene depth variation. To remove these complicated motion blurs, conventional energy optimization based methods rely on simple assumptions such that blur kernel is partially uniform or locally linear. Moreover, recent machine learning based methods also depend on synthetic blur datasets generated under these assumptions. This makes conventional deblurring methods fail to remove blurs where blur kernel is difficult to approximate or parameterize (e.g. object motion boundaries). In this work, we propose a multi-scale convolutional neural network that restores sharp images in an end-to-end manner where blur is caused by various sources. Together, we present multi-scale loss function that mimics conventional coarse-to-fine approaches. Furthermore, we propose a new large-scale dataset that provides pairs of realistic blurry image and the corresponding ground truth sharp image that are obtained by a high-speed camera. With the proposed model trained on this dataset, we demonstrate empirically that our method achieves the state-of-the-art performance in dynamic scene deblurring not only qualitatively, but also quantitatively.

PDF Abstract

Deep Multi-scale Convolutional Neural Network for Dynamic Scene Deblurring

The paper "Deep Multi-scale Convolutional Neural Network for Dynamic Scene Deblurring" by Seungjun Nah, Tae Hyun Kim, and Kyoung Mu Lee presents a robust approach to addressing the complex issue of image blur in dynamic scenes. Blur in dynamic scenes arises from a multitude of sources such as camera shakes, object motion, and depth changes, making traditional energy optimization and synthetic training datasets insufficient. This paper proposes a novel multi-scale convolutional neural network (CNN) designed to restore sharp images without relying on predefined blur kernel assumptions, offering an end-to-end solution.

Methodology

Architecture

The authors propose a multi-scale architecture which mimics conventional coarse-to-fine optimization methods. The network takes advantage of a hierarchical structure, where finer scale processing is assisted by coarse scale feature extraction. This multi-scale CNN does not estimate explicit blur kernels, thus eliminating the artifacts that stem from kernel misestimation. The architecture uses residual blocks (ResBlocks) to enable training of deeper networks and employs a modified version where the rectified linear unit (ReLU) just before the block output is removed to improve training efficiency.

Dataset

A significant contribution of the paper is the introduction of a new realistic dataset, known as the GOPRO dataset. This dataset is composed of 3214 pairs of blurry and sharp images captured using a high-speed camera, with the blur generated by averaging successive sharp frames. This realistic blurring avoids synthetic approximations and provides a robust basis for training and evaluation.

Loss Function

The training strategy includes both a multi-scale content loss and an adversarial loss. The multi-scale content loss ensures that intermediate outputs at each scale correspond to the sharp image, following a coarse-to-fine optimization approach. The addition of an adversarial loss, based on the Generative Adversarial Networks (GAN) framework, helps in generating more natural and structure-preserving images.

Results

The performance of the proposed method is compared against state-of-the-art techniques on multiple datasets including the GOPRO dataset, K{\"o}hler dataset, and the dataset by Lai et al. The empirical results demonstrate significant improvements:

On the GOPRO dataset, the proposed method achieved a PSNR of 29.23 and an SSIM of 0.9162, outperforming the methods by Kim and Lee (PSNR 23.64, SSIM 0.8239) and Sun et al. (PSNR 24.64, SSIM 0.8429).
On the K{\"o}hler dataset, the method's PSNR and MSSIM were 26.48 and 0.8079, respectively, surpassing the aforementioned techniques.
Qualitative assessments on the Lai et al. dataset and real dynamic scenes further validate the robustness of this approach.

Practical and Theoretical Implications

Practically, the proposed method provides a reliable tool for enhancing image clarity in a variety of applications such as handheld photography, video processing, and surveillance where dynamic blur is a common problem. The multi-scale architecture and the dataset introduced can be foundational for further research and development in image deblurring.

Theoretically, the work challenges the traditional reliance on explicit blur kernel estimation and synthetic datasets, demonstrating the effectiveness of learning directly from realistic blur scenarios. The employment of adversarial training in this context also opens avenues for further exploration of GAN-based approaches in image restoration tasks.

Future Directions

Future research might explore:

Extending the model to handle even more complex scenes with mixed motion and lighting conditions.
Integrating additional high-level semantic information to further improve the deblurring quality.
Reducing computational costs, potentially through network pruning or more efficient architectures, to enable real-time deblurring on low-power devices.

In summary, this paper provides a comprehensive and effective solution to the problem of dynamic scene deblurring, significantly advancing the current state-of-the-art while offering practical benefits and new directions for future research.

PDF Markdown Bookmark Chat (Pro)

Authors (3)

Seungjun Nah (17 papers)
Tae Hyun Kim (26 papers)
Kyoung Mu Lee (107 papers)

Citations (1,819)

View on Semantic Scholar