Deep Iterative Frame Interpolation for Full-frame Video Stabilization (1909.02641v1)

Published 5 Sep 2019 in cs.CV and cs.GR

Abstract: Video stabilization is a fundamental and important technique for higher quality videos. Prior works have extensively explored video stabilization, but most of them involve cropping of the frame boundaries and introduce moderate levels of distortion. We present a novel deep approach to video stabilization which can generate video frames without cropping and low distortion. The proposed framework utilizes frame interpolation techniques to generate in between frames, leading to reduced inter-frame jitter. Once applied in an iterative fashion, the stabilization effect becomes stronger. A major advantage is that our framework is end-to-end trainable in an unsupervised manner. In addition, our method is able to run in near real-time (15 fps). To the best of our knowledge, this is the first work to propose an unsupervised deep approach to full-frame video stabilization. We show the advantages of our method through quantitative and qualitative evaluations comparing to the state-of-the-art methods.

Citations (60)

View on Semantic Scholar

Summary

The paper introduces DIFRINT, an unsupervised framework that iteratively interpolates frames to achieve full-frame video stabilization.
It employs U-Net and ResNet architectures with bidirectional optical flow to interpolate frames and reduce inter-frame jitter.
Empirical results show near real-time processing at 15 fps with superior stability, preserving the original video content without cropping.

Deep Iterative Frame Interpolation for Full-frame Video Stabilization

The paper "Deep Iterative Frame Interpolation for Full-frame Video Stabilization" introduces an unsupervised deep learning approach to stabilize videos while maintaining the original frame content without cropping. The proposed method, termed DIFRINT (Deep Iterative FRame INTerpolation), emphasizes frame interpolation techniques to minimize inter-frame jitter in an unsupervised manner, which is a pioneering attempt for full-frame video stabilization.

Overview of Methodology

The DIFRINT framework leverages a novel approach by interpolating between two frames subject to spatial jitter across successive frames. Unlike conventional video stabilization methods that require cropping, DIFRINT generates intermediate frames between frames with jitter, thereby positioning itself as an interpolation-based stabilization method. By iteratively applying frame interpolations, the method enhances video stabilization while ensuring that the frame boundaries are also interpolated.

The framework's architecture consists of a U-Net and ResNet for reconstructing high-quality frame interpolations. The adjacent frames in a video sequence are warped towards an artificially generated pseudo-middle frame, allowing the network to achieve and improve stabilization iteratively. Key to this approach is the use of bidirectional optical flow and unsupervised network training, relying on pixel-wise loss and perceptual loss functions to guide the network's learning process.

Technical Contributions

The salient features of DIFRINT include:

Full-frame Stabilization: It avoids cropping altogether by generating stabilized frames that retain the original content.
Unsupervised Training: The framework is trained on vast datasets without the need for stable reference counterparts, making it highly scalable.
Real-time Capability: The method achieves near real-time processing speed (15 fps), enabling efficient stabilization of video sequences.

Quantitative and Qualitative Evaluation

The authors conducted rigorous quantitative evaluations against existing state-of-the-art video stabilization methods. Metrics such as cropping ratio, distortion value, and stability scores were used to benchmark performance. DIFRINT consistently showed superior results, particularly excelling in maintaining the original video content without cropping and minimal distortion.

Qualitatively, the visual comparisons to other methods highlighted DIFRINT's ability to handle video stabilization effectively by preserving the visual content and generating missing frame areas, contrary to traditional methods that often suffer from significant cropping or distortion artifacts.

Implications and Future Directions

The implications of this research are profound in the field of computer vision, particularly for applications requiring high-quality and stable video outputs, such as in digital video editing and stabilization software. Furthermore, the application of deep learning to video stabilization without supervised datasets signals a shift towards more adaptive and scalable video processing solutions.

Future research could explore adaptive stabilization adjustments based on detected motion types, providing users with more dynamic control over stabilization effects. Additionally, handling extreme motion blur or compensating for severe camera shakes could be further developed to enhance the robustness of the method under challenging conditions.

Overall, DIFRINT presents a significant advancement in video stabilization technology, particularly in its innovative use of deep iterative frame interpolation, providing a valuable tool for generating high-quality stabilized videos without compromising original content.

PDF Markdown

Related Papers

YouTube

Show All Videos