Temporal Modulation Network for Controllable Space-Time Video Super-Resolution (2104.10642v2)

Published 21 Apr 2021 in cs.CV

Abstract: Space-time video super-resolution (STVSR) aims to increase the spatial and temporal resolutions of low-resolution and low-frame-rate videos. Recently, deformable convolution based methods have achieved promising STVSR performance, but they could only infer the intermediate frame pre-defined in the training stage. Besides, these methods undervalued the short-term motion cues among adjacent frames. In this paper, we propose a Temporal Modulation Network (TMNet) to interpolate arbitrary intermediate frame(s) with accurate high-resolution reconstruction. Specifically, we propose a Temporal Modulation Block (TMB) to modulate deformable convolution kernels for controllable feature interpolation. To well exploit the temporal information, we propose a Locally-temporal Feature Comparison (LFC) module, along with the Bi-directional Deformable ConvLSTM, to extract short-term and long-term motion cues in videos. Experiments on three benchmark datasets demonstrate that our TMNet outperforms previous STVSR methods. The code is available at https://github.com/CS-GangXu/TMNet.

Citations (82)

View on Semantic Scholar

Summary

The paper introduces a Temporal Modulation Block (TMB) that adapts deformable convolution kernels for arbitrary frame interpolation.
The paper employs a Locally-temporal Feature Comparison (LFC) module to capture short-term motion cues and maintain temporal coherence.
The paper integrates a bi-directional deformable ConvLSTM to aggregate long-term motion details, achieving superior PSNR and SSIM metrics compared to prior methods.

Overview of Temporal Modulation Network for Controllable STVSR

The paper presents a novel approach to space-time video super-resolution (STVSR), which aims to enhance both the spatial and temporal resolutions of videos, attempting to align the quality of Full High Definition (FHD) content with the high standards of Ultra High Definition (UHD) displays. Traditional methods in this space often rely on cumbersome motion estimation techniques and are limited by their inability to interpolate frames flexibly. The proposed Temporal Modulation Network (TMNet) addresses these limitations with innovative use of deformable convolutions modulated for arbitrary frame interpolation.

Key Contributions

The authors of the paper introduce several components that are critical to achieving their goals:

Temporal Modulation Block (TMB): The TMB is a novel component designed to modulate deformable convolution kernels in response to a temporal hyper-parameter, thus enabling the interpolation of video frames at arbitrary moments. This flexibility vastly extends the application scope of video enhancement technologies beyond fixed-frame-rate scenarios.
Locally-temporal Feature Comparison (LFC) Module: This module is crafted to seize short-term motion cues among adjacent frames, guaranteeing that interpolated frames maintain consistency in motion, which is pivotal for perceptual quality.
Bi-directional Deformable ConvLSTM: While not a novel contribution by itself, its deployment in conjunction with the LFC module is innovative. It aggregates long-term motion variability, enhancing the temporal coherence and spatial detail retrieval across entire video sequences.

Numerical and Empirical Results

The proposed TMNet demonstrates superior performance across several benchmarks. It achieves higher PSNR and SSIM metrics compared to preceding methods like STARnet and Zooming Slow-Mo, underscoring its ability to maintain detail and temporal coherence. The flexibility in frame interpolation is showcased through experiments where arbitrary temporal points for interpolation are specified, without a significant drop in performance.

Practical and Theoretical Implications

From a practical standpoint, the TMNet facilitates the enhancement of video capture devices and streaming platforms, allowing for more versatile frame rate adjustments post recording. This could be particularly beneficial for applications like live sports broadcasting where frame rates may need to be adjusted dynamically.

Theoretically, the concept of temporal modulation introduced in this paper could inspire future research into more generalized modulation mechanisms across other dimensions (e.g., spatial, chromatic) in neural networks. This work also indicates potential advancements in real-time processing capabilities for video content, fostering further integration of complex neural architectures into streaming technologies.

Future Directions

Research may explore exploring more sophisticated modulation techniques that can dynamically adapt not only to temporal parameters but also to contextual features of the video content. Additionally, the development of lighter versions of TMNet could make the technology accessible for deployment in devices with more constrained computational resources.

In conclusion, TMNet is a critical step forward in STVSR, offering groundbreaking flexibility in frame interpolation and setting a new benchmark for the quality and adaptability of video enhancement technologies.

PDF Markdown

Related Papers

GitHub

GitHub - CS-GangXu/TMNet: The official pytorch implemention of the CVPR paper "Temporal Modulation Network for Controllable Space-Time Video Super-Resolution". (106 stars)