Real-Time Video Super-Resolution with Spatio-Temporal Networks and Motion Compensation (1611.05250v2)

Published 16 Nov 2016 in cs.CV

Abstract: Convolutional neural networks have enabled accurate image super-resolution in real-time. However, recent attempts to benefit from temporal correlations in video super-resolution have been limited to naive or inefficient architectures. In this paper, we introduce spatio-temporal sub-pixel convolution networks that effectively exploit temporal redundancies and improve reconstruction accuracy while maintaining real-time speed. Specifically, we discuss the use of early fusion, slow fusion and 3D convolutions for the joint processing of multiple consecutive video frames. We also propose a novel joint motion compensation and video super-resolution algorithm that is orders of magnitude more efficient than competing methods, relying on a fast multi-resolution spatial transformer module that is end-to-end trainable. These contributions provide both higher accuracy and temporally more consistent videos, which we confirm qualitatively and quantitatively. Relative to single-frame models, spatio-temporal networks can either reduce the computational cost by 30% whilst maintaining the same quality or provide a 0.2dB gain for a similar computational cost. Results on publicly available datasets demonstrate that the proposed algorithms surpass current state-of-the-art performance in both accuracy and efficiency.

Citations (625)

View on Semantic Scholar

Summary

The paper introduces a novel spatio-temporal network that integrates motion compensation for efficient real-time video super-resolution.
It leverages sub-pixel convolution within CNNs to map low-resolution inputs to high-resolution outputs, reducing computational cost by up to 30% while boosting PSNR.
The end-to-end trainable framework ensures temporal coherence, making it highly applicable to high-definition streaming, medical imaging, and satellite video enhancements.

Real-Time Video Super-Resolution with Spatio-Temporal Networks and Motion Compensation

This paper presents a significant advancement in video super-resolution (SR) through the integration of convolutional neural networks (CNNs) with spatio-temporal sub-pixel convolution networks. The authors propose an innovative framework that exploits temporal redundancies for improved reconstruction accuracy while maintaining real-time processing speeds.

Overview

The research addresses the challenge of reconstructing high-resolution (HR) videos from low-resolution (LR) inputs. Traditional SR methods have primarily focused on single image reconstruction, while this paper extends the concept to video by incorporating temporal data, thus enabling better utilization of inherent video correlations.

The methodology leverages three primary techniques:

Spatio-Temporal Networks: The paper introduces various fusion strategies to process consecutive video frames. These include early fusion, slow fusion, and 3D convolutions. The comparative analysis revealed that incorporating multiple frames allows the network to reduce computational costs or enhance SR accuracy.
Motion Compensation: A novel approach using spatial transformer networks efficiently performs motion compensation. This technique is beneficial as it estimates and compensates inter-frame motion, providing temporally consistent videos.
Sub-Pixel Convolution: The use of sub-pixel convolution efficiently maps LR to HR space without preliminary upscaling, drastically reducing computational demands.

Key Contributions

Algorithm Efficiency: Relative to state-of-the-art single-frame models, the proposed spatio-temporal networks achieve comparable quality with up to a 30% reduction in computational cost. Alternatively, they provide a 0.2dB improvement in PSNR without increasing computational burden.
End-to-End Trainability: The motion compensation mechanism is integrated using differentiable spatial transformers, allowing joint optimization with the SR network. This end-to-end approach delivers higher accuracy and more temporally coherent reconstruction.
Comparative Analysis: Through experiments on publicly available datasets, the proposed method consistently surpasses existing approaches in both efficiency and qualitative metrics like PSNR and MOVIE.

Implications and Future Work

The implications of this research are substantial for both practical applications and theoretical advancements. Real-time SR applications, such as high-definition streaming services, medical imaging, and satellite videos, stand to benefit significantly from this efficient and precise SR approach.

The proposed methodology opens avenues for further exploration into more complex fusion strategies and the integration of motion dynamics into the neural network models. Future work could also extend this framework to accommodate more significant scaling factors or adapt to varying video content complexities without sacrificing real-time performance.

In conclusion, this paper provides a detailed framework for advancing video super-resolution, combining spatio-temporal processing and motion compensation within a highly efficient neural network model. The proposed advancements mark significant progress in achieving real-time, high-quality video enhancement.

PDF Markdown