Fine-Tuning StyleGAN2 For Cartoon Face Generation (2106.12445v1)

Published 22 Jun 2021 in cs.CV and eess.IV

Abstract: Recent studies have shown remarkable success in the unsupervised image to image (I2I) translation. However, due to the imbalance in the data, learning joint distribution for various domains is still very challenging. Although existing models can generate realistic target images, it's difficult to maintain the structure of the source image. In addition, training a generative model on large data in multiple domains requires a lot of time and computer resources. To address these limitations, we propose a novel image-to-image translation method that generates images of the target domain by finetuning a stylegan2 pretrained model. The stylegan2 model is suitable for unsupervised I2I translation on unbalanced datasets; it is highly stable, produces realistic images, and even learns properly from limited data when applied with simple fine-tuning techniques. Thus, in this paper, we propose new methods to preserve the structure of the source images and generate realistic images in the target domain. The code and results are available at https://github.com/happy-jihye/Cartoon-StyleGan2

Citations (25)

View on Semantic Scholar

Summary

The paper introduces FreezeSG and structure loss to enhance StyleGAN2's ability to preserve source image structures while generating cartoon faces.
It leverages unsupervised image-to-image translation and layer swapping to address data imbalance across diverse domains.
Experimental results show superior structural integrity and facial feature quality compared to existing baseline techniques.

Fine-tuning StyleGAN2 for Cartoon Face Generation

The paper "Fine-tuning StyleGAN2 for Cartoon Face Generation" explores innovative methods in unsupervised image-to-image (I2I) translation, particularly focusing on enhancing the effectiveness of StyleGAN2 when applied to cartoon face generation. The research addresses the challenges associated with learning joint distributions across diverse domains, a common hurdle due to data imbalance.

Core Contributions

The paper introduces two primary methods to refine the performance of StyleGAN2 in preserving source image structures while generating realistic target domain images:

FreezeSG: This technique involves freezing the initial generator blocks and style vectors during fine-tuning. This approach aids in maintaining the structural integrity of the source image in the generated target image. The results, when compared to existing methods like FreezeD and FreezeG, show improved structure retention, especially when combined with the Layer Swapping technique.
Structure Loss: To further enhance image structure retention, a novel loss function is applied to ensure similarity between the low-resolution representations of source and target images. Structure loss operates by minimizing the mean squared error (mse) between corresponding layers of the source and target generators, facilitating more natural and structure-consistent image translations.

Experimental Validation

The proposed methods were tested using various datasets, including FFHQ as the source domain, and Naver Webtoon, Metfaces, and Disney datasets as target domains. These datasets provided a comprehensive evaluation platform spanning diverse stylistic facial features. The FreezeSG method demonstrated superior structural integrity of source images, while the Structure Loss method provided enhanced quality in facial regions like jaws and heads.

Results

The empirical results showed the proposed methods outperforming baseline techniques (e.g., FreezeD + ADA) and alternatives like FreezeG. By integrating the novel approaches with Layer Swapping, the paper achieved significant improvements in synthesizing target domain images that closely resemble the source domain structure.

Implications and Future Directions

The research presents methods that effectively reduce computational costs and improve training efficiency in I2I translation tasks. However, fine-tuning demands specific layer adjustments for each dataset, a limitation noted by the authors. Future work could focus on automating layer adjustment processes or developing more generalized models adaptable across various domains without extensive manual configuration.

Additionally, the exploration of StyleCLIP integration highlights potential for further innovations in text-guided manipulations, allowing dynamic and expressive control over generated cartoon faces.

In conclusion, the advancements presented in this paper serve as a meaningful contribution to the field of generative models and unsupervised I2I translation, offering robust solutions for stylized image synthesis tasks. As research progresses, these methodologies could pave the way for more versatile applications in both academic and commercial domains.

PDF Markdown

Related Papers

GitHub

GitHub - happy-jihye/Cartoon-StyleGAN: Fine-tuning StyleGAN2 for Cartoon Face Generation (638 stars)