ReCoNet: Real-time Coherent Video Style Transfer Network (1807.01197v2)

Published 3 Jul 2018 in cs.CV

Abstract: Image style transfer models based on convolutional neural networks usually suffer from high temporal inconsistency when applied to videos. Some video style transfer models have been proposed to improve temporal consistency, yet they fail to guarantee fast processing speed, nice perceptual style quality and high temporal consistency at the same time. In this paper, we propose a novel real-time video style transfer model, ReCoNet, which can generate temporally coherent style transfer videos while maintaining favorable perceptual styles. A novel luminance warping constraint is added to the temporal loss at the output level to capture luminance changes between consecutive frames and increase stylization stability under illumination effects. We also propose a novel feature-map-level temporal loss to further enhance temporal consistency on traceable objects. Experimental results indicate that our model exhibits outstanding performance both qualitatively and quantitatively.

Citations (53)

View on Semantic Scholar

Summary

The paper introduces a luminance warping constraint that reduces flickering and improves temporal consistency in style-transferred videos.
It implements a feature-map-level temporal loss to maintain consistent features of moving objects across frames.
ReCoNet achieves over 200 FPS inference, striking a balance between speed and high perceptual style quality.

ReCoNet: Real-time Coherent Video Style Transfer Network

The research article titled "ReCoNet: Real-time Coherent Video Style Transfer Network" introduces a novel approach to video style transfer using convolutional neural networks. The primary objective of the paper is to enhance the temporal consistency of style-transferred videos while ensuring real-time processing speed and maintaining perceptual style quality. This work addresses a prevalent issue in existing image style transfer models: high temporal inconsistency when these methods are applied to video frames.

The authors propose ReCoNet as a solution to achieve desirable video stylization outcomes by maintaining a balance between temporal coherence and perceptual quality without compromising on processing speed. At the core of the proposed method is a novel luminance warping constraint applied at the output level to address luminance changes between consecutive frames. Moreover, a feature-map-level temporal loss is introduced to handle temporal consistency on traceable objects effectively.

Key Contributions

The following key aspects highlight the importance of the ReCoNet approach:

Luminance Warping Constraint: The authors introduce a luminance warping constraint, which is incorporated into the temporal loss calculation at the output level. This aspect is pivotal in capturing luminance changes across frames and is crucial for mitigating flickering artifacts due to illumination variations.
Feature-map-level Temporal Loss: This loss is applied at the encoded feature map level to enhance the temporal consistency of traceable objects. The goal here is to ensure that the same object retains consistent features across frames despite movements.
Real-time Processing Capabilities: ReCoNet achieves an inference speed exceeding 200 FPS on a modern GPU. This efficiency is attributed to its lightweight feed-forward architecture that processes frames independently without relying on previous frames during inference.

Experimental Validation

The efficacy of ReCoNet is verified through experimental comparisons with existing methods. The results illustrate that while maintaining temporal coherence, ReCoNet surpasses models by Chen et al. and matches Huang et al.’s models regarding temporal error metrics, all the while ensuring superior perceptual style fidelity comparable to traditional methods.

Implications and Future Work

ReCoNet's capability to perform real-time video style transfer with enhanced temporal stability has significant practical and theoretical implications. Practically, this work heralds potential applications in video editing, film production, and real-time video streaming platforms, allowing for seamless integration of style transfer in dynamic scenes. Theoretically, the introduction of a luminance warping constraint opens avenues for further exploration on incorporating luminance and chromaticity differences in style transfer processes, potentially broadening the scope of perceptual consistency in video domains.

Future research could focus on optimizing these methods further and exploring adaptive techniques that dynamically adjust to varying textures and movements in videos.

In conclusion, ReCoNet makes significant strides in the field of video style transfer, presenting a robust solution that preserves both the temporal and perceptual integrity of stylized videos in real time. The proposed innovations in loss function design, specifically targeting luminance and feature-level discrepancies, bridge gaps present in prior methodologies, thus contributing notably to the field of computer vision and deep learning in artistic video generation.

PDF Markdown

Related Papers

YouTube

Show All Videos