BasicVSR++: Improving Video Super-Resolution with Enhanced Propagation and Alignment (2104.13371v1)

Published 27 Apr 2021 in cs.CV

Abstract: A recurrent structure is a popular framework choice for the task of video super-resolution. The state-of-the-art method BasicVSR adopts bidirectional propagation with feature alignment to effectively exploit information from the entire input video. In this study, we redesign BasicVSR by proposing second-order grid propagation and flow-guided deformable alignment. We show that by empowering the recurrent framework with the enhanced propagation and alignment, one can exploit spatiotemporal information across misaligned video frames more effectively. The new components lead to an improved performance under a similar computational constraint. In particular, our model BasicVSR++ surpasses BasicVSR by 0.82 dB in PSNR with similar number of parameters. In addition to video super-resolution, BasicVSR++ generalizes well to other video restoration tasks such as compressed video enhancement. In NTIRE 2021, BasicVSR++ obtains three champions and one runner-up in the Video Super-Resolution and Compressed Video Enhancement Challenges. Codes and models will be released to MMEditing.

Authors (4)

Kelvin C. K. Chan (34 papers)
Shangchen Zhou (58 papers)
Xiangyu Xu (48 papers)
Chen Change Loy (288 papers)

Citations (344)

View on Semantic Scholar

Summary

Overview of BasicVSR++: An Enhanced Approach to Video Super-Resolution

The paper "BasicVSR++: Improving Video Super-Resolution with Enhanced Propagation and Alignment" introduces a novel architecture, BasicVSR++, designed to advance video super-resolution (VSR) through improved feature propagation and alignment mechanisms. By re-engineering its predecessor, BasicVSR, this work presents significant enhancements in capturing spatiotemporal information from video sequences, achieving superior results in VSR tasks with marginally higher computational costs.

At the core of this improvement are two primary contributions: second-order grid propagation and flow-guided deformable alignment. These innovations empower the recurrent framework to exploit temporal dependencies more effectively, allowing the model to handle misalignments across video frames and improve restoration quality, particularly in areas with occlusions or complex textures.

Key Contributions

Second-Order Grid Propagation: The authors propose a shift from the existing linear propagation strategies to a grid-like structure enabling repeated bidirectional propagation of features. By incorporating a second-order Markov chain approach, the network relaxes dependencies on immediately preceding frames, allowing information to be aggregated from multiple spatiotemporal regions. This significantly enhances the flow of information across frames, resulting in a model more robust to occluded and intricate regions in videos.
Flow-Guided Deformable Alignment: This component leverages optical flow to pre-align features before applying deformable convolutions. By learning offsets as a refinement over the optical flow, this method overcomes the inherent instability and training difficulties faced in traditional deformable alignment approaches. This refinement reduces the training burden and promotes stable learning, providing a robust mechanism to handle misalignments effectively.

Numerical Results

BasicVSR++ demonstrates remarkable improvements in PSNR across various benchmarks compared to its predecessor, registering a gain of 0.82 dB with similar parameter counts. It surpasses established state-of-the-art models across datasets like REDS4, Vimeo-90K-T, and Vid4 under both BI and BD degradation conditions. The method also garners top standings in NTIRE 2021 challenges, underscoring its robust performance not only in super-resolution but in broader video restoration contexts.

Implications and Future Prospects

From a practical standpoint, the advancements in BasicVSR++ can be adapted beyond super-resolution to other video restoration tasks such as video deblurring and denoising, broadening the utility of the proposed techniques. The work provides a structured methodology to explore recurrent networks further, potentially inspiring future designs that incorporate hierarchical and cross-scale information flows in VSR.

Theoretically, the proposed interaction between optical flow and deformable convolutions offers a nuanced perspective into modulation-based alignment strategies, aligning closely with dynamic scene understanding. This bridge between explicit flow-based guidance and learnable deformable approaches may catalyze further research into hybrid models that can flexibly adapt to diverse video contents.

Conclusion

BasicVSR++ represents a meaningful leap in the video super-resolution domain, methodically addressing previous bottlenecks in propagation and alignment. Its balanced approach to performance and efficiency sets a critical precedent for future investigations into adaptive, scalable architectures in video processing. The principles underlying its design hold potential for broader application, emphasizing a critical direction for future research in AI-driven video enhancement technologies.

PDF Markdown

Related Papers

YouTube

Show All Videos