Spatio-Temporal Filter Adaptive Network for Video Deblurring (1904.12257v2)

Published 28 Apr 2019 in cs.CV

Abstract: Video deblurring is a challenging task due to the spatially variant blur caused by camera shake, object motions, and depth variations, etc. Existing methods usually estimate optical flow in the blurry video to align consecutive frames or approximate blur kernels. However, they tend to generate artifacts or cannot effectively remove blur when the estimated optical flow is not accurate. To overcome the limitation of separate optical flow estimation, we propose a Spatio-Temporal Filter Adaptive Network (STFAN) for the alignment and deblurring in a unified framework. The proposed STFAN takes both blurry and restored images of the previous frame as well as blurry image of the current frame as input, and dynamically generates the spatially adaptive filters for the alignment and deblurring. We then propose the new Filter Adaptive Convolutional (FAC) layer to align the deblurred features of the previous frame with the current frame and remove the spatially variant blur from the features of the current frame. Finally, we develop a reconstruction network which takes the fusion of two transformed features to restore the clear frames. Both quantitative and qualitative evaluation results on the benchmark datasets and real-world videos demonstrate that the proposed algorithm performs favorably against state-of-the-art methods in terms of accuracy, speed as well as model size.

PDF Abstract

Spatio-Temporal Filter Adaptive Network for Video Deblurring

The paper introduces a novel method for video deblurring, a challenging task affected by factors such as camera shake and object motion, which cause spatially variant blur in dynamic scenes. Existing methods often rely on optical flow estimation for aligning consecutive frames or approximating blur kernels. However, these approaches can result in artifacts and may not effectively remove blur, especially when the optical flow estimation is inaccurate. The authors propose a Spatio-Temporal Filter Adaptive Network (STFAN) that integrates alignment and deblurring within a unified framework to address these challenges.

Methodology

The proposed STFAN network utilizes both blurry and restored frames from previous steps and the current frame to dynamically generate spatially adaptive filters for alignment and deblurring, without separate optical flow estimation. A key innovation is the Filter Adaptive Convolutional (FAC) layer, which aligns deblurred features from previous frames and removes spatially variant blur from current frame features. The reconstruction network then fuses and restores clear frames from these processed features.

Results

Quantitative evaluations reveal that the STFAN method outperforms state-of-the-art techniques on benchmark datasets and real-world videos regarding accuracy, processing speed, and model size. The paper demonstrates superior performance in both PSNR and SSIM metrics, confirming its ability to handle spatially variant blur more effectively than traditional methods.

Discussion

The integration of frame alignment and deblurring into a unified framework represents a significant advancement, ensuring more accurate and efficient processing. By dynamically generating adaptive filters for each video frame, STFAN mitigates the reliance on optical flow estimation, which is often error-prone in blurry conditions. The FAC layers facilitate effective feature alignment and deblurring, handled directly in the feature domain, which provides a tolerance to alignment inaccuracies.

Implications and Future Directions

The method's ability to perform alignment and deblurring in the feature domain without explicit motion estimation offers a promising avenue for enhancing video clarity in real-time applications across various fields, such as surveillance and autonomous navigation. Future research may explore further optimization of filter sizes to balance computational demands with processing efficiency in increasingly complex video scenarios and investigate extensions of this framework to other video processing tasks that require discerning spatially variant details, such as video super-resolution or denoising. Additionally, exploring the integration of this approach with reinforcement learning techniques might enhance model adaptability across different environments and motion dynamics.

PDF Markdown Bookmark Chat (Pro)

Authors (6)

Shangchen Zhou (58 papers)
Jiawei Zhang (529 papers)
Jinshan Pan (80 papers)
Haozhe Xie (20 papers)
Wangmeng Zuo (279 papers)
Jimmy Ren (32 papers)

Citations (180)

View on Semantic Scholar

Spatio-Temporal Filter Adaptive Network for Video Deblurring (1904.12257v2)