Multi-style Generative Network for Real-time Transfer (1703.06953v2)

Published 20 Mar 2017 in cs.CV

Abstract: Despite the rapid progress in style transfer, existing approaches using feed-forward generative network for multi-style or arbitrary-style transfer are usually compromised of image quality and model flexibility. We find it is fundamentally difficult to achieve comprehensive style modeling using 1-dimensional style embedding. Motivated by this, we introduce CoMatch Layer that learns to match the second order feature statistics with the target styles. With the CoMatch Layer, we build a Multi-style Generative Network (MSG-Net), which achieves real-time performance. We also employ an specific strategy of upsampled convolution which avoids checkerboard artifacts caused by fractionally-strided convolution. Our method has achieved superior image quality comparing to state-of-the-art approaches. The proposed MSG-Net as a general approach for real-time style transfer is compatible with most existing techniques including content-style interpolation, color-preserving, spatial control and brush stroke size control. MSG-Net is the first to achieve real-time brush-size control in a purely feed-forward manner for style transfer. Our implementations and pre-trained models for Torch, PyTorch and MXNet frameworks will be publicly available.

Citations (272)

View on Semantic Scholar

Summary

The paper presents MSG-Net featuring a novel CoMatch Layer that matches second-order feature statistics to enhance style transfer quality.
It employs upsampled convolution and Upsample Residual Blocks to mitigate artifacts and lower computational complexity while maintaining high fidelity.
MSG-Net achieves over 90 fps at 256x256 resolution, offering scalable multi-style transfer with real-time brush stroke size control and style interpolation.

Overview of Multi-style Generative Network for Real-time Transfer

The paper "Multi-style Generative Network for Real-time Transfer" authored by Hang Zhang and Kristin Dana presents a novel approach to the image style transfer problem by introducing significant enhancements in both image quality and model versatility. This work builds on the existing domain of feed-forward generative networks aimed at style transfer, identifying and addressing key limitations regarding the quality of image generation and flexibility across multiple styles.

Motivation and Approach

Traditional methods in the style transfer domain have often been encumbered by the limitations of 1D style embedding, which complicates comprehensive style modeling. Addressing this fundamental difficulty, the authors propose a new architectural component, referred to as the CoMatch Layer, which learns to match the second-order feature statistics with the target styles effectively using Gram matrices. This approach leverages the expressive power of 2D representations over 1D at embedding style features. Implementing the CoMatch Layer within a Multi-style Generative Network (MSG-Net), the paper demonstrates compatibility and enhanced performance across a variety of styles in real-time scenarios.

Another architectural innovation is the strategy of upsampled convolution, which mitigates artifacts such as checkerboard effects that are often introduced by fractionally strided convolutions. By employing an integer stride convolution, MSG-Net maintains high fidelity in the generated images. The architecture also incorporates an Upsample Residual Block, adapted from Bottleneck architectures, which efficiently reduces computational complexity while retaining style versatility.

Numerical Results and Performance

This work reports notable improvements in style transfer outputs, as evidenced by superior image quality when compared to existing state-of-the-art methods. MSG-Net is shown to achieve real-time performance, marking a significant advance with over 90 fps (frames per second) during image processing at a resolution of 256x256 on high-end GPUs. Furthermore, the MSG-Net architecture allows for real-time brush stroke size control, a feature previously not achievable in a feed-forward manner, thereby introducing a novel dimension of user control in style applications.

Implications and Future Directions

The implications of MSG-Net are noteworthy for both theoretical and practical aspects within the field of computer vision and graphics. The ability to effectively train a multi-style model compatible across 1,000 styles showcases potential scalability and robustness. The method's compatibility with basic style transfer controls, such as content-style interpolation and color-preserving techniques, further emphasizes its utility in comprehensive creative workflows. However, the authors identify the scalability of style representations as an ongoing challenge, acknowledging the need for continued exploration in the domain of arbitrary style transfers with higher quality.

The insights gleaned from this paper can direct future research endeavors toward enhancing models' architectural generalizability without compromising quality. Potential future work could explore improved efficiency in training multiscale image representations as style transfer tasks become increasingly complex. Additionally, the integration of the MSG-Net with adversarial networks or other innovative learning paradigms could uncover new capabilities in nuanced artistic generation and content transformation tasks.

In conclusion, the contribution of a robust and efficient Multi-style Generative Network represents a commendable stride in the domain, equipping both researchers and practitioners with a robust tool for real-time artistic and practical application.

PDF Markdown