Multimodal Transfer: A Hierarchical Deep Convolutional Neural Network for Fast Artistic Style Transfer (1612.01895v2)

Published 17 Nov 2016 in cs.CV and cs.AI

Abstract: Transferring artistic styles onto everyday photographs has become an extremely popular task in both academia and industry. Recently, offline training has replaced on-line iterative optimization, enabling nearly real-time stylization. When those stylization networks are applied directly to high-resolution images, however, the style of localized regions often appears less similar to the desired artistic style. This is because the transfer process fails to capture small, intricate textures and maintain correct texture scales of the artworks. Here we propose a multimodal convolutional neural network that takes into consideration faithful representations of both color and luminance channels, and performs stylization hierarchically with multiple losses of increasing scales. Compared to state-of-the-art networks, our network can also perform style transfer in nearly real-time by conducting much more sophisticated training offline. By properly handling style and texture cues at multiple scales using several modalities, we can transfer not just large-scale, obvious style cues but also subtle, exquisite ones. That is, our scheme can generate results that are visually pleasing and more similar to multiple desired artistic styles with color and texture cues at multiple scales.

Authors (4)

Xin Wang (1307 papers)
Geoffrey Oxholm (2 papers)
Da Zhang (35 papers)
Yuan-Fang Wang (18 papers)

Citations (160)

View on Semantic Scholar

Summary

Multimodal Transfer: A Hierarchical Deep Convolutional Neural Network for Fast Artistic Style Transfer

The paper by Wang et al., entitled "Multimodal Transfer: A Hierarchical Deep Convolutional Neural Network for Fast Artistic Style Transfer," presents an advanced approach to the artistic style transfer problem, emphasizing the transfer precision on high-resolution images. The authors propose a hierarchical convolutional neural network architecture, which offers a multilayered, multimodal method for conducting artistic style transfer with improved fidelity and efficiency compared to existing methods.

In this context, style transfer pertains to the manipulation of digital content to adopt the aesthetic characteristics of a given artwork, widely appreciated within computer vision and digital content creation due to its practical and artistic implications. Recent advancements have seen interface lean towards nearly real-time processing, supplanting traditionally slower, iterative optimization processes. Despite these improvements, handling high-resolution images without diminishing the local texture fidelity posed a challenge, often misrepresenting small, intricate textures critical to distinct artistic styles.

The proposed method introduces a multimodal convolutional neural network (CNN) that integrates representations of color and luminance channels, facilitating stylization through a hierarchical framework—a tiered system using multiple loss scales. This grants the model the ability to capture not only large-scale, evident style cues but also subtle and intricate textures, effectively simulating authentic artistic styles.

The paper's key contributions include:

Hierarchical Network Design: A novel architecture and training regime that learns both coarse and detailed textural representations of an artistic style through multiple scales of a style image.
Multimodal Approach: The use of separate models for color and luminance channels, enhancing the ability to capture nuanced artistic details in the style transfer.
Flexible Up-Scaling: Leveraging hierarchical training allows the network to extrapolate learned styles to high-resolution images without loss of texture fidelity.
Empirical Validation: Comprehensive experimentation demonstrates the network's superior performance over singular transfer methods, particularly in its ability to maintain intricate and subtle style characteristics on larger images.

The network's architecture implements a cascade of transformations through style, enhance, and refine subnetworks, each contributing to various scales of stylization. Each subnet is equipped with its own set of stylization losses and operates over designated resolutions. The multimodal loss function, incorporating content and style-derived losses, guides the convolutional network to learn effectively; this loss is instrumental in combining texture and style synthesis while considering spatial consistency.

Practical implications of this research are particularly significant in digital media and entertainment industries, where dynamic and intricate style transfer capabilities enhance creative possibilities. Theoretical considerations suggest the hierarchical architecture offers a scalable solution, potentially adaptable to other image manipulation contexts requiring high fidelity.

Future prospects in the paper of stylistic transfer should consider alternative loss functions that might offer improved granularity in style capture. Moreover, exploring different architecture designs for loss networks might reduce computational overhead and extend the application of these techniques to even larger images. The continued evolution of multimodal approaches harbors promise for advancing their operability across diverse visual media applications.

Related Papers

Find Related Papers