A Style-Aware Content Loss for Real-time HD Style Transfer (1807.10201v2)

Published 26 Jul 2018 in cs.CV

Abstract: Recently, style transfer has received a lot of attention. While much of this research has aimed at speeding up processing, the approaches are still lacking from a principled, art historical standpoint: a style is more than just a single image or an artist, but previous work is limited to only a single instance of a style or shows no benefit from more images. Moreover, previous work has relied on a direct comparison of art in the domain of RGB images or on CNNs pre-trained on ImageNet, which requires millions of labeled object bounding boxes and can introduce an extra bias, since it has been assembled without artistic consideration. To circumvent these issues, we propose a style-aware content loss, which is trained jointly with a deep encoder-decoder network for real-time, high-resolution stylization of images and videos. We propose a quantitative measure for evaluating the quality of a stylized image and also have art historians rank patches from our approach against those from previous work. These and our qualitative results ranging from small image patches to megapixel stylistic images and videos show that our approach better captures the subtle nature in which a style affects content.

Authors (4)

Artsiom Sanakoyeu (25 papers)
Dmytro Kotovenko (8 papers)
Sabine Lang (3 papers)
Björn Ommer (72 papers)

Citations (205)

View on Semantic Scholar

Summary

A Style-Aware Content Loss for Real-time HD Style Transfer

The paper "A Style-Aware Content Loss for Real-time HD Style Transfer" presents a novel approach to style transfer by introducing a style-aware content loss function. The authors critique previous methods for either relying on a single style image or requiring a particular similarity in content between style and content images, both of which are limitations the new approach seeks to overcome.

Methodology

The approach utilizes an encoder-decoder architecture combined with an adversarial discriminator, which allows for real-time, high-resolution style transfer of both images and videos. The key innovation is the style-aware content loss, which is trained jointly with the encoder-decoder network. This loss does not depend on a pre-trained classifier like VGG16, which has been the staple in prior work, thereby avoiding potential biases introduced by ImageNet pre-training.

Instead of fixed similarity measures, the style-aware loss computes a distance in an evolved latent space tailored to encode style-relevant content, making it more adaptable to different styles. This is crucial as the manner in which content should be preserved can vary greatly across artistic styles. The encoder-decoder network is trained from scratch, which bypasses the need for supervision or pre-training on large labeled datasets.

Style Image Grouping

An effective strategy for grouping style images is crucial for this methodology. The paper utilizes a classification network trained to predict the artist, allowing it to automatically find related style images. This makes the approach more scalable and adaptable, since it can operate without the extensive manual selection of style content, covering a broader spectrum of artistic diversity.

Results

Quantitative analysis includes a “style transfer deception rate” metric, which indicates the degree to which stylized outputs can deceive a classifier trained to recognize the original artist. This evaluation is novel and reflective of the method's success in capturing stylistic nuances. Overall, the averaged deception rate substantially exceeds that of previous methods.

Qualitatively, the paper demonstrates the ability to produce extensive variations across different styles. The results for high-resolution style transfer maintain essential fine details, such as brushstrokes and color transitions, which past methods often failed to depict.

Implications and Future Work

Practically, this work has significant implications for applications requiring real-time processing of style transformations. The ability to apply high-quality style transfers to video frames would prove beneficial in areas such as film post-production and interactive art installations. Theoretically, this contributes to advancing representation learning without reliance on biased pre-trained models.

Future work might explore further fine-tuning of the style aggregation strategy, optimizing the architecture for even higher resolutions, or incorporating domain-specific constraints to handle more abstract or deviating art forms. Enhanced methods for automatically curating and managing style image datasets would also be vital in further exploiting the approach’s potential.

Overall, this paper outlines a significant step forward in style transfer research, combining computational elegance with practical usability, offering a robust framework adaptable to varying artistic stylings.

PDF Markdown

Related Papers

YouTube

Show All Videos