Disentangling by Factorising: An In-Depth Review
"Disentangling by Factorising" by Hyunjik Kim and Andriy Mnih presents a method for learning disentangled representations without supervision. The paper introduce FactorVAE, an approach that extends the β-Variational Autoencoder (β-VAE), by integrating an additional penalty term aimed at encouraging factorial distributions in latent representations. This review offers an expert examination of the methodologies, results, and implications presented in the paper.
Background and Motivation
Disentangled representations aim to isolate independent factors of variation in data, which can substantially enhance downstream tasks such as supervised learning, transfer learning, and zero-shot learning. Traditional approaches often rely on supervision or semi-supervised learning to disentangle factors, which can be cost-prohibitive due to the necessity of labeled data. The β-VAE represents a significant stride in unsupervised disentangling but necessitates a trade-off between disentanglement and reconstruction quality, a challenge FactorVAE seeks to address.
FactorVAE Methodology
FactorVAE is structured on the VAE framework but introduces a new term to its objective function: a penalty that minimizes the Total Correlation (TC) of the joint latent distribution. The TC penalty is designed to push the joint distribution towards a factorial distribution by penalizing the Kullback-Leibler (KL) divergence between the joint distribution and the product of its marginal distributions. This approach is inspired by principles from the Generative Adversarial Network (GAN) literature, employing a discriminator network to approximate the density ratio required for the TC computation.
The objective function for FactorVAE can be written as: L=Eq(z∣x)[logp(x∣z)]−KL(q(z∣x)∥p(z))−γKL(q(z)∥j∏q(zj))
Here, γ is a weight parameter controlling the influence of the TC penalty.
Experimental Setup and Results
The paper evaluates FactorVAE on several datasets, including 2D Shapes, 3D Shapes, 3D Faces, 3D Chairs, and CelebA. A noteworthy finding is that FactorVAE consistently outperforms β-VAE in terms of disentanglement metrics while maintaining comparable reconstruction quality.
Specifically, for the 2D Shapes dataset, FactorVAE achieves higher disentanglement scores at equivalent levels of reconstruction error compared to β-VAE. This improvement is underscored by the superior trade-off shown in Figures that plot reconstruction error against disentanglement scores.
For metrics, the paper critiques the prevalent disentanglement metric proposed by \citet{higgins2016beta}, highlighting its potential failure mode and sensitivity to classifier hyperparameters. Instead, Kim and Mnih propose a new metric, which evaluates the variance of each latent dimension when a particular factor is held constant and all others are varied. This metric demonstrates robustness and aligns more closely with the qualitative assessments of disentanglement.
Implications and Future Work
FactorVAE's ability to disentangle factors of variations with minimal supervision exemplifies a significant contribution to representation learning. This advancement can be leveraged to enhance various AI applications where interpretable and disentangled representations are crucial, such as model-based reinforcement learning, visual concept learning, and semantically meaningful image manipulations.
Potential avenues for future work include integrating discrete latent variables to more effectively capture discrete factors of variation, as well as developing reliable unsupervised disentanglement metrics that do not depend on the ground truth. The exploration of hybrid models that can jointly model discrete and continuous factors presents an additional promising direction.
Conclusion
In summary, "Disentangling by Factorising" offers a compelling approach to unsupervised learning of disentangled representations via FactorVAE. By effectively addressing the limitations of β-VAE and proposing a robust disentanglement metric, this paper sets a new direction for future research in the field of representation learning. The enhanced trade-off between reconstruction and disentanglement achieved by FactorVAE highlights its practical utility, marking it as a noteworthy advancement in generative modeling and machine learning.