Real-world Video Deblurring: A Benchmark Dataset and An Efficient Recurrent Neural Network (2106.16028v2)

Published 30 Jun 2021 in cs.CV

Abstract: Real-world video deblurring in real time still remains a challenging task due to the complexity of spatially and temporally varying blur itself and the requirement of low computational cost. To improve the network efficiency, we adopt residual dense blocks into RNN cells, so as to efficiently extract the spatial features of the current frame. Furthermore, a global spatio-temporal attention module is proposed to fuse the effective hierarchical features from past and future frames to help better deblur the current frame. Another issue that needs to be addressed urgently is the lack of a real-world benchmark dataset. Thus, we contribute a novel dataset (BSD) to the community, by collecting paired blurry/sharp video clips using a co-axis beam splitter acquisition system. Experimental results show that the proposed method (ESTRNN) can achieve better deblurring performance both quantitatively and qualitatively with less computational cost against state-of-the-art video deblurring methods. In addition, cross-validation experiments between datasets illustrate the high generality of BSD over the synthetic datasets. The code and dataset are released at https://github.com/zzh-tech/ESTRNN.

Authors (5)

Zhihang Zhong (25 papers)
Ye Gao (7 papers)
Yinqiang Zheng (57 papers)
Bo Zheng (205 papers)
Imari Sato (12 papers)

Citations (24)

View on Semantic Scholar

Summary

An Overview of Real-world Video Deblurring Methods and Benchmark Dataset

The challenge of real-world video deblurring remains a compelling topic in computer vision due to the inherent complexity of spatially and temporally varying blur, coupled with the necessity for low computational expenditure. The paper authored by Zhong et al. addresses these challenges by introducing an Efficient Spatio-Temporal Recurrent Neural Network (ESTRNN) and a novel real-world benchmark dataset, BSD, for video deblurring.

Core Technical Contributions

The authors propose a notable advancement in network efficiency through the integration of residual dense blocks within Recurrent Neural Network (RNN) cells to enhance spatial feature extraction. The ESTRNN model they propose employs a global spatio-temporal attention module designed to amalgamate effective hierarchical features from surrounding frames, facilitating superior deblurring of the current frame. This construction allows for a balanced trade-off between computational cost and deblurring quality, enabling its potential application on resource-limited devices such as smartphones.

The paper identifies a critical gap in the availability of real-world benchmark datasets because majority of existing datasets are synthetically generated and thus fail to capture the intricacies of real-world blur. To address this, the authors introduce the Beam-splitter Deblurring Dataset (BSD), captured using an ingeniously designed co-axis beam splitter acquisition system. This dataset provides paired blurry/sharp video clips under diverse motion scenarios and blur intensities.

Experimental Evaluations

Quantitative and qualitative evaluations underscore the performance merits of ESTRNN over state-of-the-art deblurring methods. In terms of computational cost, ESTRNN demonstrates superior efficiency, achieving commendable results with significantly reduced resources. Notably, the model’s ability to efficiently leverage spatio-temporal dependencies enables it to achieve high PSNR and SSIM metrics on both synthetic datasets like GOPRO and REDS and the newly introduced BSD dataset.

The ESTRNN baseline surpasses alternative methodologies such as IFI-RNN, PVDNet, and CDVD-TSP in both cost-efficiency and deblurring performance. These substantial improvements are attributed to the novel application of the residual dense block structure within RNN cells and the selective feature fusion enabled by the spatio-temporal attention module.

Implications and Future Directions

The introduction of BSD presents substantial implications for future research. Models trained on real-world datasets demonstrate enhanced generalization capabilities over their synthetic counterparts. This underlines the significance of real-world data acquisition for model training to ensure applicability across varied settings. Furthermore, BSD aids in unveiling limitations of synthetic datasets, serving as a benchmark for assessing model robustness.

Future research could pursue the exploration of advanced alignment techniques and fusion strategies that further exploit higher-order dependencies in video data. Additionally, advancements in real-time processing capabilities and reduced latency could broaden the applicability of such systems in consumer electronics and automotive industries.

In summary, the work of Zhong et al. stimulates ongoing discourse in video deblurring. Through the carefully architected ESTRNN and thereby supplemented BSD dataset, this research encourages the continual pursuit of effective, resource-efficient deblurring solutions, steering towards more accurate and universally applicable methodologies in computer vision.

PDF Markdown

Related Papers

Find Related Papers

GitHub

GitHub - zzh-tech/ESTRNN: [ECCV2020 Spotlight] Efficient Spatio-Temporal Recurrent Neural Network for Video Deblurring (319 stars)