Artistic style transfer for videos and spherical images (1708.04538v3)

Published 13 Aug 2017 in cs.CV

Abstract: Manually re-drawing an image in a certain artistic style takes a professional artist a long time. Doing this for a video sequence single-handedly is beyond imagination. We present two computational approaches that transfer the style from one image (for example, a painting) to a whole video sequence. In our first approach, we adapt to videos the original image style transfer technique by Gatys et al. based on energy minimization. We introduce new ways of initialization and new loss functions to generate consistent and stable stylized video sequences even in cases with large motion and strong occlusion. Our second approach formulates video stylization as a learning problem. We propose a deep network architecture and training procedures that allow us to stylize arbitrary-length videos in a consistent and stable way, and nearly in real time. We show that the proposed methods clearly outperform simpler baselines both qualitatively and quantitatively. Finally, we propose a way to adapt these approaches also to 360 degree images and videos as they emerge with recent virtual reality hardware.

Citations (132)

View on Semantic Scholar

Summary

The paper introduces novel methods that extend style transfer from static images to dynamic video and spherical media.
It leverages temporal coherence losses and deep neural networks with optical flow to achieve consistent stylization across frames.
Quantitative and qualitative results demonstrate superior performance over baselines, paving the way for scalable VR content creation.

Insights on Artistic Style Transfer for Videos and Spherical Images

The paper in question marks a significant advancement in the domain of style transfer, extending its application from static images to dynamic video sequences and spherical imagery. By building on the foundational technique introduced by Gatys et al., which applies style transfer through energy minimization, this research addresses two key challenges: achieving temporal consistency in video frames and adapting style transfer techniques to emerging virtual reality formats.

Methodological Innovations

The authors devised two main approaches. The first adapts the original Gatys et al. methodology for videos by optimizing for temporal coherence. This involves using a new initialization method and introducing temporal loss functions, effectively managing discrepancies between frames, especially due to motion and occlusion. The temporal consistency loss incorporates optical flow estimates to maintain stability along point trajectories, excluding motion boundaries and disocclusions, thus optimizing the visual flow consistently over time.

The second approach positions video stylization as a machine learning problem, proposing a deep neural network architecture tailored for this purpose. This neural network is trained to manage sequence length variation and achieve approximate real-time performance. The insightful design of the network, which uses previously stylized frames as input, helps mitigate temporal inconsistencies that arise from the independent frame stylization. The network’s integration of optical flow data further enhances its robustness against flickering.

Numerical Outcomes and Practical Implications

The paper's outcomes clearly indicate superiority over baseline methods, both qualitatively and quantitatively. By leveraging various benchmarks and datasets, including the challenging Sintel MPI dataset, the authors demonstrate a measurable reduction in temporal artifacts. Furthermore, the paper proposes a straightforward means to extend these methodologies to spherical videos and images, which is especially relevant in the context of virtual reality. This entails considering edge consistency on a cube map projection to ensure stylistic coherence across adjacent image boundaries.

Theoretical Implications and Future Directions

This research broadens the theoretical framework in style transfer techniques by accommodating temporal dynamics and non-planar projections. The introduction of a learning-based system for videos could lead to more sophisticated models capable of understanding and emulating more complex temporal patterns in video content. Future research could focus on enhancing the real-time capabilities of these methods and integrating more adaptive optical flow computations to further reduce computational overheads and increase accuracy.

Moreover, as video data grows increasingly complex and abundant, refining these models to work seamlessly with high-resolution video and ensuring consistency across varied stylistic inputs are promising directions. The potential for including transfer techniques in interactive media, virtual production, and immersive content creation is substantial, prompting further explorations into user-interactive style applications and real-world usability.

Conclusion

The authors have presented a compelling extension of artistic style transfer applications, addressing core challenges in applying static image techniques to videos and spherical media. By combining deep learning and traditional optimization methods, they have provided a robust framework that handles consistency challenges in video processing while anticipating the needs of rapid technological advances in VR. As such, this paper is not only a technical milestone but a precursor to more scalable and creative applications of style transfer in audiovisual media.

PDF Markdown

Related Papers

YouTube

Show All Videos