AniGAN: Style-Guided Generative Adversarial Networks for Unsupervised Anime Face Generation (2102.12593v2)

Published 24 Feb 2021 in cs.CV and cs.AI

Abstract: In this paper, we propose a novel framework to translate a portrait photo-face into an anime appearance. Our aim is to synthesize anime-faces which are style-consistent with a given reference anime-face. However, unlike typical translation tasks, such anime-face translation is challenging due to complex variations of appearances among anime-faces. Existing methods often fail to transfer the styles of reference anime-faces, or introduce noticeable artifacts/distortions in the local shapes of their generated faces. We propose AniGAN, a novel GAN-based translator that synthesizes high-quality anime-faces. Specifically, a new generator architecture is proposed to simultaneously transfer color/texture styles and transform local facial shapes into anime-like counterparts based on the style of a reference anime-face, while preserving the global structure of the source photo-face. We propose a double-branch discriminator to learn both domain-specific distributions and domain-shared distributions, helping generate visually pleasing anime-faces and effectively mitigate artifacts. Extensive experiments on selfie2anime and a new face2anime dataset qualitatively and quantitatively demonstrate the superiority of our method over state-of-the-art methods. The new dataset is available at https://github.com/bing-li-ai/AniGAN .

Citations (56)

View on Semantic Scholar

Summary

The paper introduces AniGAN, a framework that converts photo portraits into anime faces using a novel generator architecture and dual-branch discriminator design.
The paper reinterprets local facial features as style elements, enabling effective shape transformation without using facial landmarks.
The paper demonstrates that custom normalization techniques and a new face-to-anime dataset yield improved FID and LPIPS scores over state-of-the-art methods.

Overview of AniGAN: Style-Guided Generative Adversarial Networks for Unsupervised Anime Face Generation

The paper "AniGAN: Style-Guided Generative Adversarial Networks for Unsupervised Anime Face Generation" introduces a novel approach for photo-to-anime face translation, termed as Style-Guided Face-to-Anime Translation (StyleFAT). This method leverages Generative Adversarial Networks (GANs) to transform a given portrait photo into an anime-like face while maintaining style-consistency with a reference anime face. The framework addresses unique challenges within this task related to the diverse and complex variations in anime styles, which include differences in color, texture, and local shape alterations.

Key Contributions

Novel Generator and Discriminator Architectures: The authors propose a new generator architecture that effectively transfers color/texture styles and transforms the local shapes of facial features into anime-like counterparts while preserving the global structural integrity of the source photo. The double-branch discriminator architecture is also introduced, which simultaneously learns domain-specific distributions and domain-shared distributions to reduce artifacts and improve quality.
Unique Handling of Local Structures: Unlike existing methods that focus on preserving local structures, AniGAN's generator reinterprets local facial shapes as a form of style, facilitating their transformation via style transfer. This is achieved without relying on facial landmarks or parsing.
Dual Normalization Techniques: The paper introduces two novel normalization functions, point-wise layer instance normalization (PoLIN) and adaptive point-wise layer instance normalization (AdaPoLIN), which enhance the transformation of local face shapes and the transfer of anime styles.
Dataset and Evaluation: A new face2anime dataset is introduced, comprising diverse anime styles to facilitate the method's evaluation. Experiments on both this dataset and selfie2anime demonstrate that AniGAN outperforms state-of-the-art techniques in various qualitative and quantitative metrics, including FID and LPIPS scores.

Implications and Future Directions

AniGAN contributes significantly to unsupervised domain translation, especially in tasks with large intra-domain variation and significant stylistic shifts. This approach can be extended and adapted for other cross-domain translations involving high stylistic variance. The impact of applying the proposed normalization methods and generator/discriminator architectures could be explored in other domains, such as caricature generation or non-photorealistic animation.

In terms of practical implications, the ability to generate high-quality, style-consistent anime faces from photos has potential applications in personalized social media content creation, entertainment, and even custom video game avatars. Moreover, the method's unsupervised nature removes the dependency on paired datasets, making it cost-effective and broadly applicable.

Overall, this work presents a comprehensive and well-founded approach to tackling the complex task of photo-to-anime face translation, providing both theoretical advancements and practical tools for future research in the field of cross-domain image generation.

PDF Markdown

Related Papers

GitHub

GitHub - bing-li-ai/AniGAN (95 stars)