Interactive Video Stylization Using Few-Shot Patch-Based Training (2004.14489v1)

Published 29 Apr 2020 in cs.GR and cs.CV

Abstract: In this paper, we present a learning-based method to the keyframe-based video stylization that allows an artist to propagate the style from a few selected keyframes to the rest of the sequence. Its key advantage is that the resulting stylization is semantically meaningful, i.e., specific parts of moving objects are stylized according to the artist's intention. In contrast to previous style transfer techniques, our approach does not require any lengthy pre-training process nor a large training dataset. We demonstrate how to train an appearance translation network from scratch using only a few stylized exemplars while implicitly preserving temporal consistency. This leads to a video stylization framework that supports real-time inference, parallel processing, and random access to an arbitrary output frame. It can also merge the content from multiple keyframes without the need to perform an explicit blending operation. We demonstrate its practical utility in various interactive scenarios, where the user paints over a selected keyframe and sees her style transferred to an existing recorded sequence or a live video stream.

Citations (76)

View on Semantic Scholar

Summary

The paper introduces a novel patch-based training strategy that propagates styles from a few keyframes to entire video sequences for real-time performance.
It employs an appearance translation network with rigorous hyper-parameter tuning to balance training speed and mitigate overfitting while ensuring temporal coherence.
Experimental results confirm that the method preserves artistic intent and visual quality, offering a competitive solution for interactive video editing workflows.

Interactive Video Stylization Using Few-Shot Patch-Based Training

The paper "Interactive Video Stylization Using Few-Shot Patch-Based Training" presents a novel approach to video stylization, leveraging learning-based methods to propagate artistic styles from a few keyframes across an entire video sequence. This strategy capitalizes on patch-based training instead of traditional full-frame training, aiming for real-time performance and simplicity without extensive pre-training or large datasets.

Methodology

The authors propose a mechanism where an artist stylizes a keyframe, and this style is extended to the entire sequence through an appearance translation network. The network is trained from scratch using only a limited number of style exemplars, ensuring that specific parts of moving objects are stylized according to the intended artistic direction. This is a significant departure from prior style transfer techniques that require lengthy pre-training and large datasets.

Key Innovations:

Patch-Based Training Strategy: To mitigate overfitting, the network is trained using randomly sampled patches from the stylized keyframes instead of the whole frame. This enhances the network's ability to generalize across different frames.
Hyper-Parameter Optimization: The paper rigorously examines the impact of various hyper-parameter settings, such as patch size and batch size, to balance training time and inference speed.
Temporal Coherence: The authors address inherent flickering when stylizing individual frames independently. They introduce auxiliary input layers with mixtures of colored Gaussians to help maintain temporal consistency.

Results

Extensive experiments across diverse video sequences demonstrate that the network can stylize complex animations while preserving artistic intent, temporal coherence, and state-of-the-art visual quality. The proposed method shows competitive results compared to existing keyframe-based video stylization algorithms in terms of quality and computational performance. Training and inference are significantly faster, facilitating real-time video stylization in professional video editing workflows.

Implications and Future Directions

The ability to stylize video in real-time opens up fascinating avenues for live and interactive applications. Artists and creators can directly manipulate video sequences, receiving immediate feedback on stylistic changes. This shift represents a substantial advancement toward integrating machine learning more seamlessly into creative processes.

However, there remain challenges, chiefly handling appearance changes when objects undergo significant transformations (like rotations). Future work might explore segmentation and enhanced registration techniques to address such issues more completely.

Overall, this research provides valuable insights into video stylization, merging artistic creativity with machine learning efficiency, promising enhancements in digital art and animation production. Further exploration could extend its capabilities to higher-resolution videos and more complex scene dynamics.

PDF Markdown

Related Papers

YouTube

Show All Videos