Depth-Aware Video Frame Interpolation (1904.00830v1)

Published 1 Apr 2019 in cs.CV

Abstract: Video frame interpolation aims to synthesize nonexistent frames in-between the original frames. While significant advances have been made from the recent deep convolutional neural networks, the quality of interpolation is often reduced due to large object motion or occlusion. In this work, we propose a video frame interpolation method which explicitly detects the occlusion by exploring the depth information. Specifically, we develop a depth-aware flow projection layer to synthesize intermediate flows that preferably sample closer objects than farther ones. In addition, we learn hierarchical features to gather contextual information from neighboring pixels. The proposed model then warps the input frames, depth maps, and contextual features based on the optical flow and local interpolation kernels for synthesizing the output frame. Our model is compact, efficient, and fully differentiable. Quantitative and qualitative results demonstrate that the proposed model performs favorably against state-of-the-art frame interpolation methods on a wide variety of datasets.

Citations (461)

View on Semantic Scholar

Summary

The paper introduces a depth-aware framework that integrates optical flow and depth maps to enhance interpolation accuracy and reduce artifacts.
It employs a depth-aware flow projection layer and hierarchical feature learning to effectively handle occlusions and large object motion.
Experimental results on multiple benchmarks show improved PSNR, SSIM, and lower interpolation errors compared to prior methods.

Depth-Aware Video Frame Interpolation

The paper "Depth-Aware Video Frame Interpolation" introduces a method for synthesizing intermediate frames in video sequences that explicitly uses depth cues to enhance the interpolation quality. This approach addresses the limitations of previous methods that struggled with large object motion and occlusions, two prevalent challenges in video frame interpolation tasks.

Methodology

The authors propose a novel depth-aware component in their video frame interpolation framework. Specifically, they leverage a depth-aware flow projection layer that prioritizes sampling closer objects over farther ones. This mechanism helps in detecting occlusions more effectively, a common source of artifacts in traditional interpolation methods. By estimating both optical flow and depth maps from input frames, the model achieves superior warping of input frames.

Key components include:

Depth-Aware Flow Projection Layer: This layer enhances flow aggregation by considering depth information, thus improving motion boundary clarity in the generated frames.
Hierarchical Feature Learning: The approach employs learned hierarchical contextual features rather than relying on pre-existing networks trained on unrelated tasks, which facilitates a more context-aware interpolation.
Adaptive Warping Layer: The model uses local interpolation kernels within an adaptive framework to efficiently synthesize new pixels by sampling from a large local neighborhood.

The proposed framework includes full differentiability, allowing for end-to-end training without the need for external pairings or mask estimations, which is notable for its compact and efficient architecture.

Results

Quantitative assessments across several datasets, including Middlebury, UCF101, Vimeo90K, and HD datasets, indicate that the proposed Depth-Aware Interpolation Network (DAIN) outperforms prior methods. The numerical results exhibit improvements in PSNR and SSIM, demonstrating particularly robust performance on datasets characterized by complex motion and occlusions.

On the Middlebury benchmark, DAIN achieves the best performance in terms of normalized interpolation error (NIE) and is competitive on interpolation error (IE). The evaluation shows that the model can interpolate frames with lesser artifacts, clearly delineated motion boundaries, and enhanced object clarity—achievements attributed to the depth-aware computation mechanism.

Implications and Future Directions

The integration of depth information into video interpolation represents a significant advancement, offering insights that could further benefit related tasks such as video editing, film restoration, and novel view synthesis. This paper also points toward potential enhancements in real-time video applications, considering the model's efficiency and compact size.

Future research might explore improvements in depth estimation accuracy or the integration of unsupervised learning techniques that could refine the model's performance on unstructured real-world video data. Additionally, efforts could extend into joint explorations of depth and flow estimation that are tightly synchronized, potentially uncovering new efficiencies and enhancements in interpolation quality.

In conclusion, the paper presents a comprehensive and well-validated framework that sets a new benchmark for video frame interpolation through the innovative use of depth cues, offering a robust foundation for further exploration in depth-aware methodologies in computer vision.

PDF Markdown

Related Papers

YouTube

Show All Videos