Learning Disentangled Representation by Exploiting Pretrained Generative Models: A Contrastive Learning View (2102.10543v2)

Published 21 Feb 2021 in cs.CV, cs.AI, and cs.LG

Abstract: From the intuitive notion of disentanglement, the image variations corresponding to different factors should be distinct from each other, and the disentangled representation should reflect those variations with separate dimensions. To discover the factors and learn disentangled representation, previous methods typically leverage an extra regularization term when learning to generate realistic images. However, the term usually results in a trade-off between disentanglement and generation quality. For the generative models pretrained without any disentanglement term, the generated images show semantically meaningful variations when traversing along different directions in the latent space. Based on this observation, we argue that it is possible to mitigate the trade-off by $(i)$ leveraging the pretrained generative models with high generation quality, $(ii)$ focusing on discovering the traversal directions as factors for disentangled representation learning. To achieve this, we propose Disentaglement via Contrast (DisCo) as a framework to model the variations based on the target disentangled representations, and contrast the variations to jointly discover disentangled directions and learn disentangled representations. DisCo achieves the state-of-the-art disentangled representation learning and distinct direction discovering, given pretrained non-disentangled generative models including GAN, VAE, and Flow. Source code is at https://github.com/xrenaa/DisCo.

Citations (32)

View on Semantic Scholar

Summary

The paper demonstrates that contrastive learning with pretrained generative models enables effective disentangled representation learning without the need for extra regularization.
It introduces a Navigator module and a Δ-Contrastor to systematically discover semantically meaningful traversal directions in the latent space.
Empirical evaluations on Cars3D, Shapes3D, and FFHQ datasets show significant improvements in MIG, DCI, and MDS metrics over traditional disentanglement methods.

Overview of DisCo: Learning Disentangled Representations with Pretrained Generative Models

This paper presents Disentanglement via Contrast (DisCo), a framework that exploits pretrained generative models to facilitate disentangled representation learning. It challenges the existing paradigms reliant on additional disentanglement constraints during the training of generative models, which often result in a trade-off between image quality and representation disentanglement. The key innovation lies in utilizing high-fidelity generative models trained without any explicit disentanglement term and focusing the learning process on discovering traversal directions in the latent space corresponding to semantically disentangled factors.

Motivation and Methodological Framework

The field of representation learning often encounters the fundamental task of disentangling the explanatory factors of the observed data. Traditionally, methods like VAE-based and InfoGAN-based models incorporate additional regularization terms such as total correlation or mutual information to promote disentanglement. While these methods have shown promise, they typically struggle with the inherent compromise between disentanglement quality and the fidelity of generated images. This paper posits that leveraging pretrained generative models, which are capable of high-quality imagery, provides a fresh perspective on mitigating this trade-off.

DisCo adopts a contrastive learning approach, focusing on the variations between paired images generated by traversing discovered directions in the latent space of a pretrained model (like GANs, VAEs, Flows). This is operationalized via a Navigator module that proposes traversal directions and a $\Delta$ -Contrastor that aids in learning the variation space by contrasting these direction-induced variations. Two notable techniques, the entropy-based domination loss and a hard negatives flipping strategy, are integrated into DisCo to enhance disentangled representation learning.

Empirical Evaluation

The DisCo framework's efficacy is explored across multiple generative models and standard disentanglement datasets (Cars3D, Shapes3D, and MPI3D). The results significantly favor DisCo when compared to typical disentanglement techniques and other discovery-based methods. A marked improvement in metrics such as Mutual Information Gap (MIG) and Disentanglement-Completeness-Informativeness (DCI) demonstrates its proficiency in extracting disentangled representations.

Furthermore, DisCo's utility extends to uncovering semantically meaningful directions in the latent space of StyleGAN2 on the FFHQ dataset, validated by the Manipulation Disentanglement Score (MDS). The algorithm's robustness is evidenced by achieving state-of-the-art results in both manipulated image quality and the precision of direction discovery.

Theoretical and Practical Implications

DisCo proposes a unified framework that cleverly extends the usability of pretrained generative models for disentangled representation learning, highlighting a lesser-explored synergy between high-quality image synthesis and factor disentanglement without retraining generative models with extra regularizations. The implications are significant: such an approach could streamline the workflow in domains where both high-resolution images and factorization of creativity or rules (as in computer graphics and generative artistry) are paramount.

Future Directions

Potential developments could focus on broadening the range of generative architectures that DisCo can assimilate and extending its application to complex real-world datasets. Moreover, integrating DisCo with model architectures that combine the benefits of both latent generation and explicit feature disentanglement could yield even more profound insights and utility in AI systems geared toward human-like understanding and creativity. Further theoretical work might also investigate more nuanced theories around contrastive learning applications in unsupervised and self-supervised scenarios alike.

PDF Markdown

Related Papers

GitHub

GitHub - xrenaa/DisCo: [ICLR2022] Code for "Learning Disentangled Representation by Exploiting Pretrained Generative Models: A Contrastive Learning View" (133 stars)