A Learned Representation For Artistic Style (1610.07629v5)

Published 24 Oct 2016 in cs.CV and cs.LG

Abstract: The diversity of painting styles represents a rich visual vocabulary for the construction of an image. The degree to which one may learn and parsimoniously capture this visual vocabulary measures our understanding of the higher level features of paintings, if not images in general. In this work we investigate the construction of a single, scalable deep network that can parsimoniously capture the artistic style of a diversity of paintings. We demonstrate that such a network generalizes across a diversity of artistic styles by reducing a painting to a point in an embedding space. Importantly, this model permits a user to explore new painting styles by arbitrarily combining the styles learned from individual paintings. We hope that this work provides a useful step towards building rich models of paintings and offers a window on to the structure of the learned representation of artistic style.

Authors (3)

Vincent Dumoulin (34 papers)
Jonathon Shlens (58 papers)
Manjunath Kudlur (8 papers)

Citations (1,124)

View on Semantic Scholar

Summary

The paper introduces conditional instance normalization, enabling a single network to transfer up to 32 styles with 99.8% shared parameters.
The model achieves competitive content and style loss metrics compared to single-style networks while supporting smooth style interpolation.
The approach efficiently adapts to new styles by freezing core weights and training only scaling and shifting parameters, reducing computational cost.

An Analysis of "A Learned Representation for Artistic Style"

The paper "A Learned Representation for Artistic Style," authored by Vincent Dumoulin, Jonathon Shlens, and Manjunath Kudlur from Google Brain, investigates the development of a scalable deep network capable of effectively capturing and representing the artistic styles of a variety of paintings. This research is situated in the domain of style transfer, a subfield of computer vision and machine learning, which involves rendering an image in the style of another while preserving its content.

Historical Context and Related Work

The paper builds on a rich history of contributions from both computer vision and machine learning. Traditional methods, such as those by Efros and Leung (1999) and Wei and Levoy (2000), approached texture synthesis using non-parametric sampling and statistical relationships between pixels. Later advancements, including works by Efros and Freeman (2001) and Liang et al. (2001), expanded these techniques to patch-based approaches and texture transfer. Kwatra et al. (2005) offered an energy minimization perspective, refining the synthesized texture iteratively. Hertzmann et al. (2001) introduced "image analogies," which transfer "filters" from example pairs to target images.

In more recent neural approaches, Gatys et al. (2015, 2016) demonstrated the use of the VGG-19 network for feature extraction in style transfer, modifying a synthesized texture by gradient descent to match feature representations. However, these methods were computationally expensive due to the iterative optimization process involved. Feedforward networks introduced by Ulyanov et al. (2016) and Johnson et al. (2016) provided a solution by converting content images directly to stylized images through learned transformations, albeit at the cost of flexibility since each network was constrained to a single style.

Key Contributions

This paper addresses the scalability limitations of single-purpose style transfer networks. It introduces conditional instance normalization, a straightforward yet impactful modification that enables a single style transfer network to encapsulate multiple styles simultaneously. The primary contributions can be summarized as follows:

Conditional Instance Normalization: By allowing the network to apply different scaling and shifting parameters after normalization for each style, this approach provides an efficient mechanism to handle multiple styles within a single network.
Scalability and Efficiency: The network is capable of modeling multiple styles (demonstrated with up to 32 styles in a single model), sharing approximately 99.8% of its parameters across styles. This reduces the number of required parameters significantly compared to training separate networks for each style.
Embedding Space Representation: The approach establishes an embedding space for styles, enabling not only the capture of various styles but also the interpolation between styles. This allows for the creation of new, blended artistic representations not present in the training data.

Methodology and Results

The network architecture, training regimen, and conditional instance normalization are meticulously detailed. The authors use a convolutional neural network with residual blocks and reflect padding, avoiding artifacts associated with conventional padding and transposed convolution layers.

Quantitative evaluations show that the N-styles model converges comparably to independent single-style models in both content and style loss metrics. Qualitative results affirm that the stylized outputs are consistent and visually compelling across diverse styles. The model's flexibility is established through experiments on various artistic styles and interpolation tasks, illustrating the potential for blending and creating novel artistic effects.

Furthermore, the paper demonstrates the efficiency of incorporating new styles by freezing the existing network weights and training only the scaling and shifting parameters for the new style. This significantly reduces the computational cost and training time, making the architecture highly adaptable.

Implications and Future Directions

The implications of this research are twofold:

Practical Applications: Style transfer applications, particularly on resource-constrained devices like mobile phones, will benefit directly from the reduced memory footprint and computational efficiency.
Theoretical Insights: The flexible capturing of stylistic elements suggests deeper insights into how neural networks learn and represent visual textures and high-level artistic styles.

Future research directions may explore:

Generative Models of Style: Utilizing the style embedding space to generate new styles, potentially modeling entire artistic movements and their variations.
Selective Style Transfer: Enhanced techniques to selectively apply style transformations, aligning with recent advancements that separate color, spatial information, and scale in style representations.
Predictive Style Representations: Developing models that predict style embeddings from style images directly, avoiding the need for separate training steps for new styles.

Conclusion

The paper offers a notable advancement in the field of style transfer, presenting a method that balances efficiency and flexibility. By leveraging conditional instance normalization, the authors achieve a scalable, multifaceted model capable of producing high-quality stylized images across diverse artistic styles. This research not only contributes to practical applications but also paves the way for future explorations into the neural representation and generation of artistic styles.

PDF Markdown

Related Papers

YouTube

Show All Videos