Zooming SlowMo: An Efficient One-Stage Framework for Space-Time Video Super-Resolution (2104.07473v1)

Published 15 Apr 2021 in cs.CV, cs.AI, cs.LG, cs.MM, and eess.IV

Abstract: In this paper, we address the space-time video super-resolution, which aims at generating a high-resolution (HR) slow-motion video from a low-resolution (LR) and low frame rate (LFR) video sequence. A na\"ive method is to decompose it into two sub-tasks: video frame interpolation (VFI) and video super-resolution (VSR). Nevertheless, temporal interpolation and spatial upscaling are intra-related in this problem. Two-stage approaches cannot fully make use of this natural property. Besides, state-of-the-art VFI or VSR deep networks usually have a large frame reconstruction module in order to obtain high-quality photo-realistic video frames, which makes the two-stage approaches have large models and thus be relatively time-consuming. To overcome the issues, we present a one-stage space-time video super-resolution framework, which can directly reconstruct an HR slow-motion video sequence from an input LR and LFR video. Instead of reconstructing missing LR intermediate frames as VFI models do, we temporally interpolate LR frame features of the missing LR frames capturing local temporal contexts by a feature temporal interpolation module. Extensive experiments on widely used benchmarks demonstrate that the proposed framework not only achieves better qualitative and quantitative performance on both clean and noisy LR frames but also is several times faster than recent state-of-the-art two-stage networks. The source code is released in https://github.com/Mukosame/Zooming-Slow-Mo-CVPR-2020 .

Citations (11)

View on Semantic Scholar

Summary

The paper introduces a one-stage framework that integrates temporal interpolation with spatial super-resolution for efficient video processing.
It employs deformable feature interpolation and a novel deformable ConvLSTM to handle complex motions and improve temporal alignment.
Experimental results demonstrate that Zooming SlowMo achieves higher PSNR/SSIM scores, speeds up processing, and reduces model size compared to two-stage methods.

Overview of Zooming SlowMo: An Efficient One-Stage Framework for Space-Time Video Super-Resolution

The paper "Zooming SlowMo: An Efficient One-Stage Framework for Space-Time Video Super-Resolution" presents a novel method for generating high-resolution, high-frame-rate videos from low-resolution, low-frame-rate input sequences. The authors address the limitations of traditional two-stage methods by proposing a one-stage approach that simultaneously handles temporal interpolation and spatial super-resolution.

Key Contributions

One-Stage STVSR Network: The proposed Zooming SlowMo (ZSM) framework integrates temporal interpolation and spatial super-resolution into a single stage. This strategy contrasts with traditional methods that decompose the problem into video frame interpolation (VFI) and video super-resolution (VSR) as separate tasks. The unified approach leverages the intrinsic connection between temporal and spatial aspects, resulting in more efficient processing and a reduced model size.
Deformable Feature Interpolation: Instead of explicitly reconstructing low-resolution intermediate frames, ZSM employs a feature-level interpolation module using deformable convolutional layers. This module better captures temporal dynamics, particularly handling complex motion, by learning spatially adaptive sampling locations.
Deformable ConvLSTM: The use of a novel deformable ConvLSTM structure allows for improved temporal alignment and context aggregation. This model effectively incorporates global temporal information, enhancing the handling of large motions in videos.
Guided Feature Interpolation Learning: To further improve temporal consistency, the authors introduce an additional cyclic interpolation loss, which leverages natural video coherence as a supervisory signal.

Experimental Results

The paper presents extensive experiments on benchmark datasets, demonstrating that ZSM significantly surpasses state-of-the-art two-stage methods in both qualitative and quantitative metrics. On the Vimeo test set, for instance, ZSM achieves superior PSNR and SSIM scores across fast, medium, and slow motion subsets. In terms of efficiency, ZSM is shown to be more than three times faster than the best two-stage counterparts, with a model size reduced by approximately four times.

Practical and Theoretical Implications

Practical Implications

Increased Efficiency: The one-stage approach yields substantial computational savings, making the framework suitable for real-time applications like high-definition video synthesis where rapid processing is critical.
Robustness in Noisy Conditions: Experiments involving input frames with noise and compression artifacts further exhibit ZSM's robustness, suggesting potential utility in practical scenarios where video data might not be pristine.

Theoretical Implications

Unified Framework Design: By jointly addressing temporal and spatial resolutions within one coherent framework, the method provides insights into solving coupled high-dimensional problems, potentially addressing other domains requiring simultaneous multi-dimensional scaling.

Future Directions

The success of Zooming SlowMo in effectively training a joint video processing model without explicit intermediate frame supervision suggests various potential explorations in AI. Future work could extend to adaptive models capable of dynamically adjusting to varying resolutions and frame rates in real-time scenarios.

Furthermore, addressing the intrinsic temporal inconsistencies observed due to frame synthesis could be approached by incorporating explicit temporal coherence constraints, thus improving video smoothness and perceptual quality.

In conclusion, the introduction of ZSM marks a significant advancement in video super-resolution techniques, offering an efficient and robust solution that aligns with both current and emerging needs in video processing technologies.

PDF Markdown

Related Papers

GitHub

GitHub - Mukosame/Zooming-Slow-Mo-CVPR-2020: Fast and Accurate One-Stage Space-Time Video Super-Resolution (accepted in CVPR 2020) (920 stars)

Tweets

https://twitter.com/mukosame/status/1383298955123982339

YouTube

Show All Videos