EDVR: Video Restoration with Enhanced Deformable Convolutional Networks (1905.02716v1)

Published 7 May 2019 in cs.CV

Abstract: Video restoration tasks, including super-resolution, deblurring, etc, are drawing increasing attention in the computer vision community. A challenging benchmark named REDS is released in the NTIRE19 Challenge. This new benchmark challenges existing methods from two aspects: (1) how to align multiple frames given large motions, and (2) how to effectively fuse different frames with diverse motion and blur. In this work, we propose a novel Video Restoration framework with Enhanced Deformable networks, termed EDVR, to address these challenges. First, to handle large motions, we devise a Pyramid, Cascading and Deformable (PCD) alignment module, in which frame alignment is done at the feature level using deformable convolutions in a coarse-to-fine manner. Second, we propose a Temporal and Spatial Attention (TSA) fusion module, in which attention is applied both temporally and spatially, so as to emphasize important features for subsequent restoration. Thanks to these modules, our EDVR wins the champions and outperforms the second place by a large margin in all four tracks in the NTIRE19 video restoration and enhancement challenges. EDVR also demonstrates superior performance to state-of-the-art published methods on video super-resolution and deblurring. The code is available at https://github.com/xinntao/EDVR.

Citations (925)

View on Semantic Scholar

Summary

The paper introduces EDVR, a video restoration framework that combines a Pyramid, Cascading and Deformable (PCD) alignment module with a Temporal and Spatial Attention (TSA) fusion module.
The method achieves state-of-the-art PSNR improvements, reaching 27.35 dB on Vid4 and 35.79 dB on Vimeo-90K for super-resolution tasks.
The framework offers practical benefits for video enhancement, effectively addressing challenges such as large motions and severe blurring.

EDVR: Video Restoration with Enhanced Deformable Convolutional Networks

The paper "EDVR: Video Restoration with Enhanced Deformable Convolutional Networks," addresses the intricate challenges in video restoration tasks, specifically focusing on super-resolution and deblurring. The method introduces a novel framework named EDVR, which leverages Enhanced Deformable Convolutions to manage the prevalent issues caused by large motions and misalignment in video sequences. The proposed framework combines a Pyramid, Cascading and Deformable (PCD) alignment module with a Temporal and Spatial Attention (TSA) fusion module to yield superior video restoration results.

Methodology

The EDVR framework is designed to handle substantial motions and intricate feature fusion, which are critical for high-quality video restoration. The main components of the framework are the PCD alignment module and the TSA fusion module.

PCD Alignment Module: The PCD alignment module aims to align features from multiple frames effectively. It utilizes deformable convolutions to handle large and complex motions by adopting a coarse-to-fine strategy. Features are first aligned at lower resolution levels, progressively refining the alignment at higher levels, similar in concept to optical flow estimation techniques. This hierarchical approach ensures the robustness of motion compensation, enhancing the alignment accuracy across frames.

TSA Fusion Module: The TSA module addresses the challenge of effectively fusing aligned features from different frames. It employs temporal attention to assess the informativeness of each neighboring frame relative to the reference frame. Intra-frame spatial attention is then applied to highlight important regions within each frame, improving the overall reconstruction quality.

Performance and Results

The effectiveness of EDVR is demonstrated through participation in the NTIRE 2019 video restoration and enhancement challenges, where it outperforms other methods across all tracks (video super-resolution and deblurring, both on clean and blurry inputs). The experimental results on several standard datasets such as Vid4, Vimeo-90K, and the newly proposed REDS dataset further corroborate the superiority of the EDVR framework.

Quantitative Results: On the Vid4 dataset, EDVR achieves an average PSNR of 27.35 dB, surpassing the state-of-the-art methods like DUF and RBPN. On the Vimeo-90K-T dataset, the framework secures an impressive PSNR of 35.79 dB, significantly higher than its closest competitors. The results on the REDS4 dataset also showcase notable performance gains, with EDVR achieving a PSNR of 31.09 dB for video super-resolution (clean) and 34.80 dB for video deblurring, demonstrating its generalization and robustness in handling diverse video sequences with complex motions and blurring.

Qualitative Results: Visual inspections depict that EDVR can accurately restore fine details in frames that exhibit high motion and severe blurring. The examples provided in the paper illustrate the ability of EDVR to produce sharp and clear images, where other methods tend to yield blurry and less detailed outputs.

Implications and Future Directions

The EDVR framework presents significant practical and theoretical implications. Practically, its ability to handle large motions and effectively fuse frame features makes it a highly valuable tool for real-world applications in video enhancement, such as film restoration, surveillance, and high-definition broadcasting. Theoretically, the combined use of deformable convolutions and attention mechanisms in a hierarchical alignment framework sets a precedent for future developments in video restoration technologies.

Future developments in this area may include further refinement of alignment and fusion strategies, exploration of more advanced attention mechanisms, and expanding the application scope of EDVR to other video enhancement tasks like denoising and de-blocking. Additionally, addressing the dataset bias issue observed during the experiments presents an open area for research, aiming to develop more generalized video restoration methods that perform consistently across various datasets.

In conclusion, the EDVR framework signifies a substantial advancement in video restoration, and its robust performance across different conditions highlights its potential to set new benchmarks in the field. The innovative integration of PCD alignment and TSA fusion modules offers a comprehensive solution to the challenges in video super-resolution and deblurring, paving the way for future research and application in enhancing video quality.

PDF Markdown

Related Papers

GitHub

GitHub - xinntao/EDVR: Winning Solution in NTIRE19 Challenges on Video Restoration and Enhancement (CVPR19 Workshops) - Video Restoration with Enhanced Deformable Convolutional Networks. EDVR has been merged into BasicSR and this repo is a mirror of BasicSR. (1,470 stars)