Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
166 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Learning Latent Space Hierarchical EBM Diffusion Models (2405.13910v2)

Published 22 May 2024 in cs.LG, cs.CV, and stat.ML

Abstract: This work studies the learning problem of the energy-based prior model and the multi-layer generator model. The multi-layer generator model, which contains multiple layers of latent variables organized in a top-down hierarchical structure, typically assumes the Gaussian prior model. Such a prior model can be limited in modelling expressivity, which results in a gap between the generator posterior and the prior model, known as the prior hole problem. Recent works have explored learning the energy-based (EBM) prior model as a second-stage, complementary model to bridge the gap. However, the EBM defined on a multi-layer latent space can be highly multi-modal, which makes sampling from such marginal EBM prior challenging in practice, resulting in ineffectively learned EBM. To tackle the challenge, we propose to leverage the diffusion probabilistic scheme to mitigate the burden of EBM sampling and thus facilitate EBM learning. Our extensive experiments demonstrate a superior performance of our diffusion-learned EBM prior on various challenging tasks.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (47)
  1. A contrastive learning approach for training variational autoencoder priors. Advances in neural information processing systems, 34:480–493, 2021.
  2. Wasserstein generative adversarial networks. In International conference on machine learning, pp.  214–223. PMLR, 2017.
  3. Resampled priors for variational autoencoders. In The 22nd International Conference on Artificial Intelligence and Statistics, pp.  66–75. PMLR, 2019.
  4. Child, R. Very deep vaes generalize autoregressive models and can outperform them on images. arXiv preprint arXiv:2011.10650, 2020.
  5. Learning energy-based model via dual-mcmc teaching. arXiv preprint arXiv:2312.02469, 2023.
  6. Learning joint latent space ebm prior model for multi-layer generator. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.  3603–3612, June 2023a.
  7. Learning hierarchical features with joint latent space energy-based prior. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp.  2218–2227, October 2023b.
  8. Diagnosing and enhancing vae models. arXiv preprint arXiv:1903.05789, 2019.
  9. Implicit generation and generalization in energy-based models. arXiv preprint arXiv:1903.08689, 2019.
  10. Improved contrastive divergence training of energy based models. arXiv preprint arXiv:2012.01316, 2020.
  11. Reduce, reuse, recycle: Compositional generation with energy-based diffusion models and mcmc. In International conference on machine learning, pp.  8489–8510. PMLR, 2023.
  12. Learning energy-based models by diffusion recovery likelihood. arXiv preprint arXiv:2012.08125, 2020.
  13. From variational to deterministic autoencoders. arXiv preprint arXiv:1903.12436, 2019.
  14. Your classifier is secretly an energy based model and you should treat it like one. arXiv preprint arXiv:1912.03263, 2019.
  15. Joint training of variational auto-encoder and latent energy-based model. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2020a.
  16. Joint training of variational auto-encoder and latent energy-based model. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  7978–7987, 2020b.
  17. Hierarchical vaes know what they don’t know. In International Conference on Machine Learning, pp.  4117–4128. PMLR, 2021.
  18. Learning probabilistic models from generator latent spaces with hat ebm. Advances in Neural Information Processing Systems, 35:928–940, 2022a.
  19. Learning probabilistic models from generator latent spaces with hat ebm. Advances in Neural Information Processing Systems, 35:928–940, 2022b.
  20. Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems, 33:6840–6851, 2020.
  21. Elbo surgery: yet another way to carve up the variational evidence lower bound. In Workshop in Advances in Approximate Bayesian Inference, NIPS, volume 1, 2016.
  22. Progressive growing of gans for improved quality, stability, and variation. arXiv preprint arXiv:1710.10196, 2017.
  23. Training generative adversarial networks with limited data. Advances in Neural Information Processing Systems, 33:12104–12114, 2020.
  24. Glow: Generative flow with invertible 1x1 convolutions. Advances in neural information processing systems, 31, 2018.
  25. Langley, P. Crafting papers on machine learning. In Langley, P. (ed.), Proceedings of the 17th International Conference on Machine Learning (ICML 2000), pp.  1207–1216, Stanford, CA, 2000. Morgan Kaufmann.
  26. Biva: A very deep hierarchy of latent variables for generative modeling. Advances in neural information processing systems, 32, 2019.
  27. Controllable and compositional generation with latent-space energy-based models. Advances in Neural Information Processing Systems, 34:13497–13510, 2021.
  28. Learning non-convergent non-persistent short-run mcmc toward energy-based model. Advances in Neural Information Processing Systems, 32, 2019.
  29. On the anatomy of mcmc-based maximum likelihood learning of energy-based models. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pp.  5272–5280, 2020a.
  30. Learning multi-layer latent variable model via variational optimization of short run mcmc for approximate inference. In European Conference on Computer Vision, pp.  361–378. Springer, 2020b.
  31. Learning latent space energy-based prior model. Advances in Neural Information Processing Systems, 33:21994–22008, 2020a.
  32. Semi-supervised learning by latent space energy-based model of symbol-vector coupling. arXiv preprint arXiv:2010.09359, 2020b.
  33. Distribution matching in variational inference. arXiv preprint arXiv:1802.06847, 2018.
  34. Ladder variational autoencoders. In Lee, D., Sugiyama, M., Luxburg, U., Guyon, I., and Garnett, R. (eds.), Advances in Neural Information Processing Systems, volume 29. Curran Associates, Inc., 2016. URL https://proceedings.neurips.cc/paper/2016/file/6ae07dcb33ec3b7c814df797cbda0f87-Paper.pdf.
  35. Generative modeling by estimating gradients of the data distribution. Advances in Neural Information Processing Systems, 32, 2019.
  36. Variational autoencoder with implicit optimal priors. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pp.  5066–5073, 2019.
  37. Nvae: A deep hierarchical variational autoencoder. Advances in Neural Information Processing Systems, 33:19667–19679, 2020.
  38. Score-based generative modeling in latent space. Advances in Neural Information Processing Systems, 34:11287–11302, 2021.
  39. Adaptive multi-stage density ratio estimation for learning latent space energy-based model. arXiv preprint arXiv:2209.08739, 2022.
  40. Vaebm: A symbiosis between variational autoencoders and energy-based models. arXiv preprint arXiv:2010.00654, 2020.
  41. Cooperative training of descriptor and generator networks. IEEE transactions on pattern analysis and machine intelligence, 42(1):27–45, 2018.
  42. Learning energy-based model with variational auto-encoder as amortized sampler. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pp.  10441–10451, 2021.
  43. Analyzing and improving generative adversarial training for generative modeling and out-of-distribution detection. arXiv preprint arXiv:2012.06568, 2020.
  44. Latent diffusion energy-based model for interpretable text modeling. In International Conference on Machine Learning (ICML 2022)., 2022.
  45. Learning energy-based prior model with diffusion-amortized mcmc. arXiv preprint arXiv:2310.03218, 2023.
  46. Learning hierarchical features from generative models. arXiv preprint arXiv:1702.08396, 2017.
  47. Learning energy-based models by cooperative diffusion recovery likelihood. arXiv preprint arXiv:2309.05153, 2023.

Summary

  • The paper introduces a novel diffusion probabilistic integration into EBMs to overcome the prior hole problem linked to Gaussian priors.
  • It improves sampling efficiency and learning by effectively handling multi-layer latent spaces with high multimodality.
  • Extensive experiments demonstrate that the diffusion-learned EBM outperforms traditional models on challenging generative tasks.

The paper "Learning Latent Space Hierarchical EBM Diffusion Models" examines the intricacies of energy-based prior models (EBMs) and multi-layer generator models, specifically addressing the limitations associated with Gaussian prior models. Gaussian priors, commonly used in multi-layer generator models, tend to fall short in terms of modeling expressivity. This inadequacy leads to what is known as the "prior hole problem," where a discrepancy emerges between the generator's posterior distribution and the prior model.

To overcome this issue, recent research has introduced EBMs as complementary second-stage models. However, this approach comes with its own set of challenges. Particularly, EBMs defined over a multi-layer latent space are highly multi-modal, making the sampling from such marginal EBM priors computationally prohibitive and leading to inefficient learning.

The authors propose a novel solution to this problem by integrating a diffusion probabilistic scheme into the EBM learning process. Diffusion processes can ease the sampling burden from EBMs, thus improving their learning efficiency. By leveraging this scheme, the authors aim to facilitate more effective training of EBMs in hierarchical latent spaces.

The paper presents extensive experimental results demonstrating that their diffusion-learned EBM prior outperforms traditional models on a variety of challenging tasks. This evidence suggests that the proposed method not only addresses the gap caused by the prior hole problem but also enhances the model's overall performance and applicability.

The advancement put forth in this paper signifies a promising direction for future research in EBMs and hierarchical generative models, potentially offering more robust solutions for tasks that require complex modeling capabilities.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets