Domain Enhanced Arbitrary Image Style Transfer via Contrastive Learning (2205.09542v2)

Published 19 May 2022 in cs.CV and cs.GR

Abstract: In this work, we tackle the challenging problem of arbitrary image style transfer using a novel style feature representation learning method. A suitable style representation, as a key component in image stylization tasks, is essential to achieve satisfactory results. Existing deep neural network based approaches achieve reasonable results with the guidance from second-order statistics such as Gram matrix of content features. However, they do not leverage sufficient style information, which results in artifacts such as local distortions and style inconsistency. To address these issues, we propose to learn style representation directly from image features instead of their second-order statistics, by analyzing the similarities and differences between multiple styles and considering the style distribution. Specifically, we present Contrastive Arbitrary Style Transfer (CAST), which is a new style representation learning and style transfer method via contrastive learning. Our framework consists of three key components, i.e., a multi-layer style projector for style code encoding, a domain enhancement module for effective learning of style distribution, and a generative network for image style transfer. We conduct qualitative and quantitative evaluations comprehensively to demonstrate that our approach achieves significantly better results compared to those obtained via state-of-the-art methods. Code and models are available at https://github.com/zyxElsa/CAST_pytorch

Authors (7)

Yuxin Zhang (91 papers)
Fan Tang (46 papers)
Weiming Dong (50 papers)
Haibin Huang (60 papers)
Chongyang Ma (52 papers)
Tong-Yee Lee (21 papers)
Changsheng Xu (100 papers)

Citations (135)

View on Semantic Scholar

Summary

Domain Enhanced Arbitrary Image Style Transfer via Contrastive Learning

The paper "Domain Enhanced Arbitrary Image Style Transfer via Contrastive Learning" addresses the problem of arbitrary image style transfer by introducing a novel framework, built upon a deep understanding of style representation in neural networks. The conventional methods for style transfer have predominantly focused on using second-order statistics such as Gram matrices, which often fail to efficiently capture localized style details and can result in style inconsistencies or artifacts. The authors propose a new approach that bypasses these limitations through the application of contrastive learning techniques in conjunction with a domain enhancement strategy.

The core contribution of the paper is the Contrastive Arbitrary Style Transfer (CAST) method, which consists of several key components aimed at improving style representation. The multi-layer style projector (MSP) encodes style in a manner that utilizes the relationships between style features directly rather than relying solely on statistical distributions. The domain enhancement module further enhances the generation by learning the distribution of image domains, distinguishing between realistic and artistic domains through adversarial learning. The paper emphasizes the combination of these components with a robust contrastive learning mechanism to significantly enhance the quality of style transfer.

A significant achievement of the CAST approach is its feature that differentiates individual style characteristics, enabling the generation of high-quality stylized images that retain both the intricate details of artistic styles and the structure of the original content image. Through a comprehensive evaluation involving qualitative and quantitative analyses, the authors demonstrate that CAST outperforms existing state-of-the-art methods in terms of maintaining content structure while accurately transferring style details.

Numerical results highlight significant improvements in metrics such as content loss and perceptual distance (LPIPS), which align closely with human preferences as demonstrated in the conducted user studies. These studies, along with high deception rates calculated through style classification networks, indicate that CAST generates results with authenticity comparable to real artworks.

One of the notable potentials of this research lies in its implications for both theoretical advancements and practical applications. Theoretically, the emphasis on direct feature-based style learning through contrastive methodology could inspire further research in other areas of computer vision where style or domain translation plays a pivotal role. Practically, applications of CAST in industries such as digital content creation, virtual reality, and multimedia entertainment are conceivable, where the seamless blending of artistic content into various domains is desirable.

Looking forward, enhancements could involve integrating more nuanced forms of contrastive learning that allow for dynamic style adaptation across broader categories or leveraging the framework’s adaptability for real-time applications. Additionally, exploring the augmentation of MSP with additional contextual information about the style categories might refine the method further.

In essence, this paper introduces a well-rounded approach that effectively addresses some of the inherent flaws in previous arbitrary style transfer methods by employing a novel use of contrastive learning and domain-specific enhancements, ultimately broadening the scope and applicability of neural style transfer techniques.

PDF Markdown

Domain Enhanced Arbitrary Image Style Transfer via Contrastive Learning (2205.09542v2)

Summary

Domain Enhanced Arbitrary Image Style Transfer via Contrastive Learning

Related Papers

GitHub

YouTube