MEMC-Net: Motion Estimation and Motion Compensation Driven Neural Network for Video Interpolation and Enhancement

Published 20 Oct 2018 in cs.CV | (1810.08768v2)

Abstract: Motion estimation (ME) and motion compensation (MC) have been widely used for classical video frame interpolation systems over the past decades. Recently, a number of data-driven frame interpolation methods based on convolutional neural networks have been proposed. However, existing learning based methods typically estimate either flow or compensation kernels, thereby limiting performance on both computational efficiency and interpolation accuracy. In this work, we propose a motion estimation and compensation driven neural network for video frame interpolation. A novel adaptive warping layer is developed to integrate both optical flow and interpolation kernels to synthesize target frame pixels. This layer is fully differentiable such that both the flow and kernel estimation networks can be optimized jointly. The proposed model benefits from the advantages of motion estimation and compensation methods without using hand-crafted features. Compared to existing methods, our approach is computationally efficient and able to generate more visually appealing results. Furthermore, the proposed MEMC-Net can be seamlessly adapted to several video enhancement tasks, e.g., super-resolution, denoising, and deblocking. Extensive quantitative and qualitative evaluations demonstrate that the proposed method performs favorably against the state-of-the-art video frame interpolation and enhancement algorithms on a wide range of datasets.

Abstract PDF Upgrade to Chat

Citations (306)

View on Semantic Scholar

Summary

The paper introduces MEMC-Net, a neural network using an adaptive warping layer that integrates optical flow-based warping and learned interpolation kernels for synthesizing frames.
MEMC-Net demonstrates performance comparable to or exceeding state-of-the-art on benchmark datasets including UCF101 (PSNR 35.01) and Vimeo90K (SSIM 0.9742).
The adaptive warping mechanism makes MEMC-Net versatile, applicable not only to video interpolation but also to other enhancement tasks like super-resolution and denoising.

Overview of MEMC-Net: A Motion Estimation and Compensation Driven Neural Network

The paper introduces MEMC-Net, a neural network architecture specifically designed to enhance video frame interpolation using two fundamental components: motion estimation (ME) and motion compensation (MC). The framework seeks to exploit the enduring principles of classical video interpolation while incorporating contemporary advancements in convolutional neural networks (CNNs).

Central to MEMC-Net is the creation of an adaptive warping layer that harmoniously integrates optical flow and interpolation kernels to synthesize target frame pixels. This unification allows for seamless application across different video enhancement tasks—beyond interpolation—including video super-resolution, denoising, and deblocking. By forgoing hand-crafted features, this architecture is capable of learning from data, offering both computational efficiency and improved visual outcomes.

Methodology and Noteworthy Results

MEMC-Net addresses limitations present in prior learning-based approaches, which typically separately estimate either optical flow or compensation kernels. It does this through two innovations:

Adaptive Warping Layer: This layer synthesizes non-existent frames by combining bilinear warping based on optical flow with learned interpolation kernels. The layer is fully differentiable, allowing for end-to-end training and joint optimization of the motion and kernel estimation networks.
Flow Projection Layer: By simulating intermediate flow fields via an outside-in strategy, this layer efficiently manages ambiguous flow information in occluded regions, ensuring spatial continuity in the synthesized images.

MEMC-Net showcases performance comparable to or surpassing the state-of-the-art algorithms across several benchmark datasets, such as UCF101, Vimeo90K, and Middlebury. Quantitative evaluations reveal considerable improvements in metrics such as PSNR and SSIM.

UCF101 Dataset: Achieves a PSNR of 35.01, indicating superior performance in synthesizing visually rich frames.
Vimeo90K Dataset: Registers a SSIM of 0.9742, reflecting a high degree of preservation of structural information in interpolated frames.
Middlebury Benchmark: Demonstrates low Interpolation Error (IE), further underscoring its robustness in handling various motion complexities.

Practical and Theoretical Implications

The proposed architecture has several practical implications:

Versatility: MEMC-Net's architecture leverages its foundational adaptive warping mechanism to extend into other video enhancement domains, suggesting a strong applicability in real-world video processing applications.
Efficiency: High-resolution video processing benefits from MEMC-Net's computational efficiency, rendering it viable for resource-constrained environments.

From a theoretical perspective, this work elucidates the benefit of integrating motion-centric components within a learning-based framework, challenging the prevailing dichotomy between flow-based and kernel-based methods. Furthermore, it opens avenues for developing other hybrid model architectures leveraging multiple complementary learning paradigms.

Future Directions

Given the promising performance, future developments could include exploring additional forms of domain-specific adaptations or integrating attention mechanisms to further discipline kernel application. Additionally, the approach could be refined for use with video data characterized by extreme conditions such as low-light or fast-motion environments, potentially through multi-scale network architectures or ensemble learning strategies.

In conclusion, MEMC-Net stands as a significant contribution to the video processing landscape, offering a nuanced approach that effectively combines classical and modern techniques. Its versatility and robustness not only fulfill current needs but also set a promising trajectory for future exploration in video interpolation and enhancement technologies.

Markdown