Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Avatar-Net: Multi-scale Zero-shot Style Transfer by Feature Decoration (1805.03857v2)

Published 10 May 2018 in cs.CV

Abstract: Zero-shot artistic style transfer is an important image synthesis problem aiming at transferring arbitrary style into content images. However, the trade-off between the generalization and efficiency in existing methods impedes a high quality zero-shot style transfer in real-time. In this paper, we resolve this dilemma and propose an efficient yet effective Avatar-Net that enables visually plausible multi-scale transfer for arbitrary style. The key ingredient of our method is a style decorator that makes up the content features by semantically aligned style features from an arbitrary style image, which does not only holistically match their feature distributions but also preserve detailed style patterns in the decorated features. By embedding this module into an image reconstruction network that fuses multi-scale style abstractions, the Avatar-Net renders multi-scale stylization for any style image in one feed-forward pass. We demonstrate the state-of-the-art effectiveness and efficiency of the proposed method in generating high-quality stylized images, with a series of applications include multiple style integration, video stylization and etc.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Lu Sheng (63 papers)
  2. Ziyi Lin (12 papers)
  3. Jing Shao (109 papers)
  4. Xiaogang Wang (230 papers)
Citations (283)

Summary

Avatar-Net: Multi-scale Zero-shot Style Transfer by Feature Decoration

The paper "Avatar-Net: Multi-scale Zero-shot Style Transfer by Feature Decoration" introduces the Avatar-Net, a framework for achieving efficient and effective zero-shot style transfer which addresses both generalization and efficiency issues that limit previous methods. The authors propose a novel style decorator module that semantically aligns content features with style features from an arbitrary style image, ensuring both holistic feature distribution alignment and preservation of detailed style patterns. This approach facilitates visually plausible stylization across multiple scales within a single feed-forward network pass.

The key innovation of this work lies in the style decorator, which utilizes a patch-based strategy to combine content and style features in a manner that retains the semantic integrity of the content while embedding the style's characteristic patterns. Compared to existing methods like AdaIN and WCT, the style decorator offers superior propagation of detailed style patterns by matching normalized content features with style features in a shared feature space, minimizing bias and enhancing the diversity of rendered style patterns in the output.

Avatar-Net employs an hourglass network architecture with skip connections, enabling a multi-scale rendering process integrated with the style decorator module. The style adaptations occur at multiple scales, allowing for effective, simultaneous style transfer at both local and global levels. This approach contrasts with prior single-scale methods and recursive transformation requirements found in approaches like WCT, resulting in improved stylization quality and computational efficiency.

Empirical results demonstrate Avatar-Net's qualitative superiority in generating diverse, high-quality stylized images while maintaining competitive or better execution times relative to state-of-the-art methods, such as Gatys et al., AdaIN, and Style-Swap. Notably, Avatar-Net achieves substantial efficiency improvements, particularly when using AdaIN for whitening and recoloring transformations within the style decorator module, rendering it feasible for real-time applications.

The paper's contributions extend beyond image stylization to include applications such as style interpolation where multiple sources of style are blended seamlessly within a single pass. Moreover, Avatar-Net's architectural design supports video stylization by offering temporal consistency in the synthesized outputs across frames, a feat not robustly achieved by many existing methods.

In terms of broader implications, Avatar-Net exemplifies an advanced approach to style transfer that holds promise for expanded use in creative industries, enabling rapid prototyping and iteration of stylistic designs without the need to pre-train on specific styles. Future extensions could explore adaptive learning frameworks to further optimize the style decorator mechanism, potentially elevating the robustness and flexibility of style transfer systems. As AI continues to integrate into creative processes, the advancements made by Avatar-Net offer both practical and theoretical groundwork for future developments in computer vision and style transfer methodologies.