Papers
Topics
Authors
Recent
2000 character limit reached

Unsupervised Discovery of Disentangled Manifolds in GANs

Published 24 Nov 2020 in cs.CV | (2011.11842v2)

Abstract: As recent generative models can generate photo-realistic images, people seek to understand the mechanism behind the generation process. Interpretable generation process is beneficial to various image editing applications. In this work, we propose a framework to discover interpretable directions in the latent space given arbitrary pre-trained generative adversarial networks. We propose to learn the transformation from prior one-hot vectors representing different attributes to the latent space used by pre-trained models. Furthermore, we apply a centroid loss function to improve consistency and smoothness while traversing through different directions. We demonstrate the efficacy of the proposed framework on a wide range of datasets. The discovered direction vectors are shown to be visually corresponding to various distinct attributes and thus enable attribute editing.

Citations (7)

Summary

  • The paper introduces an unsupervised framework to discover interpretable latent directions in GANs, enabling controlled image attribute manipulation.
  • It leverages a deformator and a reconstructor with a novel centroid loss to ensure consistent and semantically meaningful transformations.
  • The method demonstrates improved RCA and PPL metrics across datasets such as Anime Faces, ILSVRC, and FFHQ, enhancing latent space disentanglement.

Unsupervised Discovery of Disentangled Manifolds in GANs

Introduction

The paper "Unsupervised Discovery of Disentangled Manifolds in GANs" (2011.11842) addresses the challenge of interpretable manipulation in Generative Adversarial Networks (GANs). The interpretability of latent spaces in GANs is critical for various applications in image editing, where controlled manipulation of attributes is desired. Current GAN models often lack this interpretable structure, necessitating methods to disentangle and identify latent attributes without supervision.

Methodology

The authors propose a framework to discover interpretable directions in the latent space of GANs in an unsupervised manner. The core methodology includes a novel approach toward discovering these directions by leveraging a pre-trained generator alongside a deformator and a reconstructor network.

  • Latent Space Exploration: The method involves sampling latent vectors and translating the generated image using these vectors sequentially.
  • Deformator and Reconstructor: The system employs a deformator to map editing attributes and magnitudes onto the latent space and a reconstructor to predict the intended manipulation attributes based on image transformations.
  • Centroid Loss: A unique centroid loss function improves the consistency of translations in the latent space, ensuring that shifted codes exhibit semantically meaningful directions. Figure 1

    Figure 1: Interpretable latent space discovery. The proposed method explores the interpretable directions in the latent space of pretrained models in an unsupervised manner. With the mined interpretable directions, images can be manipulated by changing different attributes smoothly.

Experimentation

The paper validates the proposed framework across multiple datasets: Anime Faces, ILSVRC-ImageNet, and Flickr-Faces-HQ. The evaluation metrics include Reconstructor Classification Accuracy (RCA) and Perceptual Path Length (PPL), providing detailed insights into the model's performance.

  • RCA and PPL Metrics: RCA measures the accuracy of direction prediction in image transformations, while PPL assesses the smoothness of transitions in generated images. Figure 2

    Figure 2: Concept Illustration. The method aims to discover interpretable directions in latent space in an unsupervised manner, transforming images along semantic attributes.

  • Visual Results: The authors showcase transformations using Spectral Norm GAN, BigGAN, and StyleGAN2, revealing the system's capability to change attributes such as hair color, expression, and object motion. Figure 3

    Figure 3: The proposed framework. The framework includes a pre-trained generator, deformator, and reconstructor. It illustrates the interactions among elements for discovering and applying latent space directions.

    Figure 4

    Figure 4: Sampled results. Examples of interpretable directions for Spectral Norm GAN on the Anime Faces dataset. Rows show the transformation of images along attribute directions.

Quantitative Analysis

The comparative experiments exhibit improvements over baseline methods, particularly in maintaining discriminability and smooth transitions. The innovative centroid loss significantly enhances the perceptual quality and diversity within generated image sequences.

  • Performance Gains: The method achieves competitive RCA scores while demonstrating marked improvements in PPL, underscoring the effectiveness of the centroid loss in aligning attribute directions with perceptual coherence. Figure 5

    Figure 5: Diversity of discovered directions. Examples from the ILSVRC and FFHQ datasets show the diversity of attributes discovered using pre-trained BigGAN and StyleGAN2.

Conclusion

The proposed framework presents a significant advancement in the discovery of interpretable latent directions in GANs. By introducing unsupervised disentanglement strategies and incorporating centroid loss, the paper offers a robust solution for generating images with diverse semantic attributes. Future work could expand on these findings by exploring further latent space strategies and integrating with other generative models to enhance versatility and interpretability in image synthesis applications.

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.