SemanticStyleGAN: Learning Compositional Generative Priors for Controllable Image Synthesis and Editing (2112.02236v3)

Published 4 Dec 2021 in cs.CV

Abstract: Recent studies have shown that StyleGANs provide promising prior models for downstream tasks on image synthesis and editing. However, since the latent codes of StyleGANs are designed to control global styles, it is hard to achieve a fine-grained control over synthesized images. We present SemanticStyleGAN, where a generator is trained to model local semantic parts separately and synthesizes images in a compositional way. The structure and texture of different local parts are controlled by corresponding latent codes. Experimental results demonstrate that our model provides a strong disentanglement between different spatial areas. When combined with editing methods designed for StyleGANs, it can achieve a more fine-grained control to edit synthesized or real images. The model can also be extended to other domains via transfer learning. Thus, as a generic prior model with built-in disentanglement, it could facilitate the development of GAN-based applications and enable more potential downstream tasks.

Authors (4)

Yichun Shi (40 papers)
Xiao Yang (158 papers)
Yangyue Wan (1 paper)
Xiaohui Shen (67 papers)

Citations (74)

View on Semantic Scholar

Summary

SemanticStyleGAN: Learning Compositional Generative Priors for Controllable Image Synthesis and Editing

The paper "SemanticStyleGAN" introduces a novel architecture for Generative Adversarial Networks (GANs) aimed at achieving fine-grained, controllable image synthesis and editing. This model addresses the limitations of existing StyleGANs, which are inherently constrained by their ability to manipulate global image styles but lack precise control over local elements due to ambiguous latent codes.

Contributions

Compositional Generator Architecture: SemanticStyleGAN introduces a generator that disentangles the latent space into local semantic areas governed by semantic segmentation masks. This decomposition enables distinct control over the structure and texture of image components, such as face, hair, and eyes.
GAN Training Framework: The model employs a joint learning approach for both images and their corresponding semantic segmentation masks, facilitating the maintenance of semantic integrity during the image generation process.
Decoupled Downstream Editing: Designed to integrate with existing latent space manipulation techniques, SemanticStyleGAN offers enhanced precision in editing real or synthesized images, thus overcoming the biases associated with StyleGAN's latent space correlations.
Domain Adaptation via Transfer Learning: Experiments demonstrate that SemanticStyleGAN can extend its application beyond initial training domains, retaining spatial disentanglement capabilities with minimal re-training even in data-limited scenarios.

Experimental Results

SemanticStyleGAN's performance is quantified through prominent quality metrics such as Fréchet Inception Distance (FID) and Inception Score (IS), showing competitive synthesis quality compared to StyleGAN2. More significantly, its architecture provides meaningful separation between local features, enabling insightful latent space navigation and singular feature manipulation capabilities, a significant advance in GAN-based image synthesis.

The paper highlights that SemanticStyleGAN achieves a FID of 7.22 and an IS of 3.47 at 512x512 resolution, exhibiting synthesis quality close to StyleGAN2, which scored 6.47 in FID and 3.55 in IS. Such results underline the ability of SemanticStyleGAN to produce high-quality images while providing enhanced control over specific image attributes.

Implications and Future Directions

By establishing a compositional methodology rooted in semantic understanding, SemanticStyleGAN sets a precedent for interpretable, controllable image generation models. It infers a future trajectory wherein GANs can more effectively bridge the gap between generative models and precise, user-directed outcomes. The implications are broad, spanning creative industries leveraging photo-realistic synthesis to semantic-driven design applications requiring flexible user interventions.

Future efforts might refine this paradigm through additional layers of regularization or semi-supervised learning strategies, crucial to scaling the model across more complex datasets without exhaustive supervision. Furthermore, tackling inherent biases attributed to latent space correlations can enhance fine-tuning capability across diverse domains. The expandable nature of SemanticStyleGAN's architecture suggests potential utility in expanding AI’s proficiency in generating personalized digital content.

PDF Markdown

Related Papers

Find Related Papers

YouTube

Show All Videos