Deep Animation Video Interpolation in the Wild (2104.02495v1)

Published 6 Apr 2021 in cs.CV

Abstract: In the animation industry, cartoon videos are usually produced at low frame rate since hand drawing of such frames is costly and time-consuming. Therefore, it is desirable to develop computational models that can automatically interpolate the in-between animation frames. However, existing video interpolation methods fail to produce satisfying results on animation data. Compared to natural videos, animation videos possess two unique characteristics that make frame interpolation difficult: 1) cartoons comprise lines and smooth color pieces. The smooth areas lack textures and make it difficult to estimate accurate motions on animation videos. 2) cartoons express stories via exaggeration. Some of the motions are non-linear and extremely large. In this work, we formally define and study the animation video interpolation problem for the first time. To address the aforementioned challenges, we propose an effective framework, AnimeInterp, with two dedicated modules in a coarse-to-fine manner. Specifically, 1) Segment-Guided Matching resolves the "lack of textures" challenge by exploiting global matching among color pieces that are piece-wise coherent. 2) Recurrent Flow Refinement resolves the "non-linear and extremely large motion" challenge by recurrent predictions using a transformer-like architecture. To facilitate comprehensive training and evaluations, we build a large-scale animation triplet dataset, ATD-12K, which comprises 12,000 triplets with rich annotations. Extensive experiments demonstrate that our approach outperforms existing state-of-the-art interpolation methods for animation videos. Notably, AnimeInterp shows favorable perceptual quality and robustness for animation scenarios in the wild. The proposed dataset and code are available at https://github.com/lisiyao21/AnimeInterp/.

Citations (81)

View on Semantic Scholar

Summary

The paper proposes AnimeInterp, a novel framework that overcomes animation interpolation challenges with specialized SGM and RFR modules.
The approach uses segment-guided matching to manage smooth color areas and recurrent flow refinement to tackle large, non-linear motions.
The authors introduce the ATD-12K dataset and achieve superior PSNR and SSIM scores compared to state-of-the-art methods.

Deep Animation Video Interpolation in the Wild

In this paper, the authors present a comprehensive paper of animation video interpolation, addressing unique challenges posed by animation as opposed to natural video sequences. Traditional animation production involves drawing frames manually, which is time-consuming, leading to a reduced frame rate in many animations. This limitation creates a demand for computational methods that can generate intermediate frames automatically.

Key Contributions

The authors identify two distinct challenges in animation videos: the lack of texture due to smooth color areas and the presence of large, non-linear motions. To tackle these, they propose AnimeInterp, a novel framework incorporating two specialized modules: Segment-Guided Matching (SGM) and Recurrent Flow Refinement (RFR).

Segment-Guided Matching (SGM): This module addresses the challenge of smooth color areas by employing global matching among coherent color segments. This coarse-level matching helps circumvent local minima that can arise in regions with low texture.
Recurrent Flow Refinement (RFR): Using a transformer-like architecture, this module refines the optical flow predictions by employing recurrent predictions. It enhances the system's ability to accommodate the non-linear and large motions typical of animation frames.

A significant contribution is the development of a novel dataset, ATD-12K, which includes 12,000 triplets from various animation films, providing a diverse and robust foundation for training and evaluation.

Experimental Evaluation

The authors evaluate AnimeInterp against state-of-the-art methods like Super SloMo, DAIN, and SoftSplat. AnimeInterp outperforms these methods quantitatively and qualitatively in interpolation tasks. On the ATD-12K test set, AnimeInterp achieves superior PSNR and SSIM scores, particularly excelling in challenging scenarios characterized by large and complex motions.

The dataset itself is meticulously curated with annotations that include difficulty levels and motion categories, allowing for extensive testing and evaluation across different animation styles and complexities. This detailed annotation and categorization reveal academic and industrial implications, offering insights for future research and applications in animation production.

Implications and Future Work

The research presented in this paper holds significant implications for both the theoretical understanding and practical implementation of animation video interpolation. The proposed approach not only addresses longstanding challenges but also opens avenues for developing more sophisticated models that can handle diverse artistic styles.

Future developments could explore integrating more advanced machine learning architectures to further refine optical flow prediction and improve temporal consistency. Additionally, expanding the dataset to include more animation styles and complexities can provide a more comprehensive benchmark for future methods.

In summary, this paper provides a rigorous exploration of animation video interpolation, proposing innovative solutions to longstanding challenges and setting a foundational benchmark for future research in this domain. The introduction of AnimeInterp and the ATD-12K dataset represent significant advancements in facilitating high-quality automated animation content generation.

PDF Markdown

Related Papers

GitHub

Tweets

https://twitter.com/ccloy/status/1380345320014422019

https://twitter.com/khlorghaal/status/1537183081001627648