Disentangling Disentangled Representations: Towards Improved Latent Units via Diffusion Models (2410.23820v1)
Abstract: Disentangled representation learning (DRL) aims to break down observed data into core intrinsic factors for a profound understanding of the data. In real-world scenarios, manually defining and labeling these factors are non-trivial, making unsupervised methods attractive. Recently, there have been limited explorations of utilizing diffusion models (DMs), which are already mainstream in generative modeling, for unsupervised DRL. They implement their own inductive bias to ensure that each latent unit input to the DM expresses only one distinct factor. In this context, we design Dynamic Gaussian Anchoring to enforce attribute-separated latent units for more interpretable DRL. This unconventional inductive bias explicitly delineates the decision boundaries between attributes while also promoting the independence among latent units. Additionally, we also propose Skip Dropout technique, which easily modifies the denoising U-Net to be more DRL-friendly, addressing its uncooperative nature with the disentangling feature extractor. Our methods, which carefully consider the latent unit semantics and the distinct DM structure, enhance the practicality of DM-based disentangled representations, demonstrating state-of-the-art disentanglement performance on both synthetic and real data, as well as advantages in downstream tasks.
- Alyaa Aloraibi. Image morphing techniques: A review. Technium: Romanian Journal of Applied Sciences and Technology, 9:41–53, 2023.
- Richard Bellman. Dynamic programming. Chapter IX, Princeton University Press, Princeton, New Jersey, 1958.
- Representation learning: A review and new perspectives. IEEE transactions on pattern analysis and machine intelligence, 35(8):1798–1828, 2013.
- Multi-level variational autoencoder: Learning disentangled representations from grouped observations. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 32, 2018.
- High-dimensional data clustering. Computational statistics & data analysis, 52(1):502–519, 2007.
- 3d shapes dataset. https://github.com/deepmind/3dshapes-dataset/, 2018.
- Understanding disentangling in beta-vae. arXiv preprint arXiv:1804.03599, 2018.
- Parallel sampling of dp mixture models using sub-cluster splits. Advances in Neural Information Processing Systems, 26, 2013.
- Isolating sources of disentanglement in variational autoencoders. Advances in neural information processing systems, 31, 2018.
- Infogan: Interpretable representation learning by information maximizing generative adversarial nets. Advances in neural information processing systems, 29, 2016.
- Club: A contrastive log-ratio upper bound of mutual information. In International conference on machine learning, pages 1779–1788. PMLR, 2020.
- Evolving gaussian mixture models with splitting and merging mutation operators. Evolutionary computation, 24(2):293–317, 2016.
- Unsupervised learning of compositional energy concepts. Advances in Neural Information Processing Systems, 34:15608–15620, 2021.
- A framework for the quantitative evaluation of disentangled representations. In International conference on learning representations, 2018.
- Taming transformers for high-resolution image synthesis. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 12873–12883, 2021.
- Dava: Disentangling adversarial variational autoencoder. arXiv preprint arXiv:2303.01384, 2023.
- On the transfer of inductive bias from simulation to the real world: a new disentanglement dataset. Advances in Neural Information Processing Systems, 32, 2019.
- Generative adversarial nets. Advances in neural information processing systems, 27, 2014.
- Ganspace: Discovering interpretable gan controls. Advances in neural information processing systems, 33:9841–9850, 2020.
- Gans trained by a two time-scale update rule converge to a local nash equilibrium. Advances in neural information processing systems, 30, 2017.
- Towards a definition of disentangled representations. arXiv preprint arXiv:1812.02230, 2018.
- beta-vae: Learning basic visual concepts with a constrained variational framework. ICLR (Poster), 3, 2017.
- Denoising diffusion probabilistic models. Advances in neural information processing systems, 33:6840–6851, 2020.
- Disentanglement via latent quantization. Advances in Neural Information Processing Systems, 36, 2024.
- Tripod: Three complementary inductive biases for disentangled representation learning. arXiv preprint arXiv:2404.10282, 2024.
- Towards a comprehensive evaluation of dimension reduction methods for transcriptomic data visualization. Communications biology, 5(1):719, 2022.
- Categorical reparameterization with gumbel-softmax. arXiv preprint arXiv:1611.01144, 2016.
- Elucidating the design space of diffusion-based generative models. Advances in neural information processing systems, 35:26565–26577, 2022.
- Analyzing and improving the training dynamics of diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 24174–24184, 2024.
- Disentangled representations from non-disentangled models. arXiv preprint arXiv:2102.06204, 2021.
- Disentangling by factorising. In International conference on machine learning, pages 2649–2658. PMLR, 2018.
- Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114, 2013.
- Variational inference of disentangled latent concepts from unlabeled observations. arXiv preprint arXiv:1711.00848, 2017.
- Diffusion models already have a semantic latent space. arXiv preprint arXiv:2210.10960, 2022.
- Building machines that learn and think like people. Behavioral and brain sciences, 40:e253, 2017.
- Deep learning. nature, 521(7553):436–444, 2015.
- A novel split and merge em algorithm for gaussian mixture model. In 2009 Fifth International Conference on Natural Computation, volume 6, pages 479–483. IEEE, 2009.
- Deep learning face attributes in the wild. In Proceedings of International Conference on Computer Vision (ICCV), December 2015.
- Challenging common assumptions in the unsupervised learning of disentangled representations. In international conference on machine learning, pages 4114–4124. PMLR, 2019.
- The EM algorithm and extensions. John Wiley & Sons, 2007.
- Finite mixture models, volume 299. John Wiley & Sons, 2000.
- Finite scalar quantization: Vq-vae made simple. arXiv preprint arXiv:2309.15505, 2023.
- Overlooked implications of the reconstruction loss for vae disentanglement. arXiv preprint arXiv:2202.13341, 2022.
- Elements of causal inference: foundations and learning algorithms. The MIT Press, 2017.
- Sdxl: Improving latent diffusion models for high-resolution image synthesis. arXiv preprint arXiv:2307.01952, 2023.
- Diffusion autoencoders: Toward a meaningful and decodable representation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10619–10629, 2022.
- Deep visual analogy-making. Advances in neural information processing systems, 28, 2015.
- Learning disentangled representation by exploiting pretrained generative models: A contrastive learning view. arXiv preprint arXiv:2102.10543, 2021.
- Learning deep disentangled embeddings with the f-statistic loss. Advances in neural information processing systems, 31, 2018.
- Learning ordered representations with nested dropout. In International Conference on Machine Learning, pages 1746–1754. PMLR, 2014.
- High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022.
- U-net: Convolutional networks for biomedical image segmentation. In Medical image computing and computer-assisted intervention–MICCAI 2015: 18th international conference, Munich, Germany, October 5-9, 2015, proceedings, part III 18, pages 234–241. Springer, 2015.
- Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 22500–22510, 2023.
- Closed-form factorization of latent semantics in gans. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 1532–1540, 2021.
- Ken Shoemake. Animating rotation with quaternion curves. In Proceedings of the 12th annual conference on Computer graphics and interactive techniques, pages 245–254, 1985.
- Freeu: Free lunch in diffusion u-net. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4733–4743, 2024.
- Finegan: Unsupervised hierarchical disentanglement for fine-grained object generation and discovery. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 6490–6499, 2019.
- Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502, 2020.
- Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research, 15(1):1929–1958, 2014.
- What the daam: Interpreting stable diffusion using cross attention. arXiv preprint arXiv:2210.04885, 2022.
- Dylora: Parameter efficient tuning of pre-trained models using dynamic search-free low-rank adaptation. arXiv preprint arXiv:2210.07558, 2022.
- Unsupervised discovery of interpretable directions in the gan latent space. In International conference on machine learning, pages 9786–9796. PMLR, 2020.
- Disentangled representation learning. arXiv preprint arXiv:2211.11695, 2022.
- Infodiffusion: Representation learning using information maximizing diffusion models. In International Conference on Machine Learning, pages 36336–36354. PMLR, 2023.
- George Wolberg. Image morphing: a survey. The Visual Computer, 14:360–372, 1998.
- Factorized diffusion autoencoder for unsupervised disentangled representation learning. 2024.
- Diffusion model with cross attention as an inductive bias for disentanglement. arXiv preprint arXiv:2402.09712, 2024.
- Visual concepts tokenization. NeurIPS, 2022.
- Disdiff: Unsupervised disentanglement of diffusion probabilistic models. arXiv preprint arXiv:2301.13721, 2023.
- Nashae: Disentangling representations through adversarial covariance minimization. In European Conference on Computer Vision, pages 36–51. Springer, 2022.
- Diffmorpher: Unleashing the capability of diffusion models for image morphing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7912–7921, 2024.
- Adding conditional control to text-to-image diffusion models. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 3836–3847, 2023.
- Unsupervised representation learning from pre-trained diffusion probabilistic models. Advances in neural information processing systems, 35:22117–22130, 2022.
- InfoVAE: Information maximizing variational autoencoders. arXiv preprint arXiv:1706.02262, 2017.
- A survey of morphing techniques. International Journal of Advanced engineering, Management and Science, 3:81–87, 2017.