Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
149 tokens/sec
GPT-4o
9 tokens/sec
Gemini 2.5 Pro Pro
47 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Denoising Diffusion Variational Inference: Diffusion Models as Expressive Variational Posteriors (2401.02739v4)

Published 5 Jan 2024 in cs.LG, q-bio.QM, and stat.ML

Abstract: We propose denoising diffusion variational inference (DDVI), a black-box variational inference algorithm for latent variable models which relies on diffusion models as flexible approximate posteriors. Specifically, our method introduces an expressive class of diffusion-based variational posteriors that perform iterative refinement in latent space; we train these posteriors with a novel regularized evidence lower bound (ELBO) on the marginal likelihood inspired by the wake-sleep algorithm. Our method is easy to implement (it fits a regularized extension of the ELBO), is compatible with black-box variational inference, and outperforms alternative classes of approximate posteriors based on normalizing flows or adversarial networks. We find that DDVI improves inference and learning in deep latent variable models across common benchmarks as well as on a motivating task in biology -- inferring latent ancestry from human genomes -- where it outperforms strong baselines on the Thousand Genomes dataset.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (57)
  1. Trimap: Large-scale dimensionality reduction using triplets. arXiv preprint arXiv:1910.00204, 2019.
  2. Structured denoising diffusion models in discrete state-spaces. Advances in Neural Information Processing Systems, 34:17981–17993, 2021.
  3. Linear discriminant analysis-a brief tutorial. Institute for Signal and information Processing, 18(1998):1–8, 1998.
  4. The isomap algorithm and topological stability. Science, 295(5552):7–7, 2002.
  5. Visualizing population structure with variational autoencoders. G3, 11(1):jkaa036, 2021.
  6. Laplacian eigenmaps for dimensionality reduction and data representation. Neural computation, 15(6):1373–1396, 2003.
  7. Multidimensional scaling. Measurement, judgment and decision making, pp.  179–250, 1998.
  8. Generating long sequences with sparse transformers, 2019.
  9. Diffusion bridges vector quantized variational autoencoders. arXiv preprint arXiv:2202.04895, 2022.
  10. Diffusion maps. Applied and computational harmonic analysis, 21(1):5–30, 2006.
  11. Calibrated uncertainty estimation improves bayesian optimization, 2023.
  12. Deep multi-modal structural equations for causal effect estimation with unstructured proxies. Advances in Neural Information Processing Systems, 35:10931–10944, 2022.
  13. Umap reveals cryptic population structure and phenotype heterogeneity in large genomic cohorts. PLoS genetics, 15(11):e1008432, 2019.
  14. Generative adversarial nets. Advances in neural information processing systems, 27, 2014.
  15. Diffusion maps for high-dimensional single-cell analysis of differentiation data. Bioinformatics, 31(18):2989–2998, 2015.
  16. Paul Henderson. Sammon mapping. Pattern Recognit. Lett, 18(11-13):1307–1316, 1997.
  17. beta-vae: Learning basic visual concepts with a constrained variational framework. In International conference on learning representations, 2016.
  18. The” wake-sleep” algorithm for unsupervised neural networks. Science, 268(5214):1158–1161, 1995.
  19. Denoising diffusion probabilistic models. Advances in neural information processing systems, 33:6840–6851, 2020.
  20. Denoising criterion for variational auto-encoding framework. In Proceedings of the AAAI conference on artificial intelligence, volume 31, 2017.
  21. Composing graphical models with neural networks for structured representations and fast inference. In Advances in Neural Information Processing Systems (NIPS) 29, 2016. arXiv:1603.06277 [stat.ML].
  22. Variational diffusion models. Advances in neural information processing systems, 34:21696–21707, 2021.
  23. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114, 2013.
  24. Semi-supervised learning with deep generative models. Advances in neural information processing systems, 27, 2014.
  25. Improved variational inference with inverse autoregressive flow. Advances in neural information processing systems, 29, 2016.
  26. Learning multiple layers of features from tiny images. 2009.
  27. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998. doi: 10.1109/5.726791.
  28. Auxiliary deep generative models. In International conference on machine learning, pp.  1445–1453. PMLR, 2016.
  29. Adversarial autoencoders. arXiv preprint arXiv:1511.05644, 2015.
  30. Stephen Marsland. Machine Learning: An Algorithmic Perspective (2nd Edition). Chapman and Hall/CRC, 2014.
  31. Umap: Uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426, 2018.
  32. On incorporating inductive biases into vaes. arXiv preprint arXiv:2106.13746, 2021.
  33. Phate: a dimensionality reduction method for visualizing trajectory structures in high-dimensional biological data. BioRxiv, 120378, 2017.
  34. Transformer neural processes: Uncertainty-aware meta learning via sequence modeling. In Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvári, Gang Niu, and Sivan Sabato (eds.), International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA, volume 162 of Proceedings of Machine Learning Research, pp.  16569–16594. PMLR, 2022. URL https://proceedings.mlr.press/v162/nguyen22b.html.
  35. Improved denoising diffusion probabilistic models. In International Conference on Machine Learning, pp.  8162–8171. PMLR, 2021.
  36. Diffusion autoencoders: Toward a meaningful and decodable representation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  10619–10629, 2022.
  37. Semi-parametric inducing point networks and neural processes. 2023.
  38. Variational inference with normalizing flows. In International conference on machine learning, pp.  1530–1538. PMLR, 2015.
  39. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.  10684–10695, 2022.
  40. Efficient content-based sparse attention with routing transformers. Transactions of the Association for Computational Linguistics, 9:53–68, 2021. doi: 10.1162/tacl˙a˙00353. URL https://aclanthology.org/2021.tacl-1.4.
  41. Markov chain monte carlo and variational inference: Bridging the gap. In International conference on machine learning, pp.  1218–1226. PMLR, 2015.
  42. Autoregressive quantile flows for predictive uncertainty estimation. In International Conference on Learning Representations, 2022.
  43. Semi-autoregressive energy flows: Exploring likelihood-free training of normalizing flows. In Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and Jonathan Scarlett (eds.), Proceedings of the 40th International Conference on Machine Learning, volume 202 of Proceedings of Machine Learning Research, pp.  31732–31753. PMLR, 23–29 Jul 2023. URL https://proceedings.mlr.press/v202/si23a.html.
  44. Nayanah Siva. 1000 genomes project. Nature biotechnology, 26(3):256–257, 2008.
  45. Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502, 2020.
  46. Visualizing large-scale and high-dimensional data. In Proceedings of the 25th international conference on world wide web, pp.  287–297, 2016.
  47. Vae with a vampprior. In International Conference on Artificial Intelligence and Statistics, pp.  1214–1223. PMLR, 2018.
  48. Nvae: A deep hierarchical variational autoencoder. Advances in neural information processing systems, 33:19667–19679, 2020.
  49. Score-based generative modeling in latent space. Advances in Neural Information Processing Systems, 34:11287–11302, 2021.
  50. Laurens Van der Maaten and Geoffrey Hinton. Visualizing data using t-sne. Journal of machine learning research, 9(11), 2008.
  51. Understanding how dimension reduction tools work: an empirical approach to deciphering t-sne, umap, trimap, and pacmap for data visualization. The Journal of Machine Learning Research, 22(1):9129–9201, 2021.
  52. InfoDiffusion: Representation learning using information maximizing diffusion models. In Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and Jonathan Scarlett (eds.), Proceedings of the 40th International Conference on Machine Learning, volume 202 of Proceedings of Machine Learning Research, pp.  36336–36354. PMLR, 23–29 Jul 2023a. URL https://proceedings.mlr.press/v202/wang23ah.html.
  53. Infodiffusion: Representation learning using information maximizing diffusion models. arXiv preprint arXiv:2306.08757, 2023b.
  54. Diffusion priors in variational autoencoders. arXiv preprint arXiv:2106.15671, 2021.
  55. Principal component analysis. Chemometrics and intelligent laboratory systems, 2(1-3):37–52, 1987.
  56. Unsupervised representation learning from pre-trained diffusion probabilistic models. Advances in Neural Information Processing Systems, 35:22117–22130, 2022.
  57. Infovae: Information maximizing variational autoencoders. arXiv preprint arXiv:1706.02262, 2017.
Citations (1)

Summary

  • The paper introduces DDVI, which integrates a diffusion process into variational inference to create more expressive posterior distributions.
  • It employs a wake-sleep framework with a novel regularization approach, resulting in improved performance in clustering and semi-supervised learning tasks.
  • Empirical results on human genetic data highlight its potential to advance latent variable modeling in generative tasks and complex decision-making.

Introduction

The field of latent variable models (LVMs) represents a powerful approach for modeling complex data in a lower-dimensional latent space, which enables tasks like dimensionality reduction, data visualization, and unsupervised learning. Variational inference (VI) is a prominent technique employed to approximate posterior distributions within these models. However, the expressivity of the approximate posterior has a substantial impact on VI's performance. The latest publication in this domain introduces denoising diffusion variational inference (DDVI), a method that seeks to enhance variational posteriors' expressivity utilizing diffusion models, resulting in a novel class of algorithms including the denoising diffusion VAE (DD-VAE).

Variational Inference with Denoising Diffusion Models

DDVI incorporates diffusion methods within the variational posterior, transforming a simple latent representation into a complex one through an iterative process. Inspired by the wake-sleep algorithm, DDVI introduces a wake phase with a reconstruction loss and a sleep phase with a novel form of regularization that encourages the posterior to match a user-specified noising process. The empirical results suggest that the DD-VAE, an instantiation of this framework, can outperform other methods, particularly in tasks where capturing semantically meaningful structures is crucial.

Semi-Supervised Learning and Clustering Applications

In semi-supervised learning scenarios, DDVI enhances latent variable models to accommodate observable labels in part of the data, allowing the model to be fitted with a mixture of labeled and unlabeled samples. When it comes to clustering, DDVI presents flexibility. One approach is to retain the model's original prior and introduce an additional cluster latent variable, with the other relying on partitioning the prior into a mixture that reflects different clusters. The method's applicability to the semi-supervised learning of human ancestry from genomes demonstrates its capability to grasp semantically rich structures from the data.

Discussion and Conclusion

DDVI and DD-VAE reflect significant steps towards more expressive variational posteriors that offer advantages over more rigid ones. Clustering and semi-supervised learning results on human genetic data underline the potential of diffusion-based encoders in latent variable models. The regularized nature of the learning objective and the ability to direct the posterior to follow intricate distributions hold promise for a variety of applications. Nonetheless, as with any method that departs from traditional approaches like the evidence lower bound, this novelty is accompanied by the need for careful attention to the choice of regularizers and additional parameters.

The advancement in latent variable modeling spearheaded by methods like DDVI continues to pave the way for extensive applications of generative models in complex decision-making and estimation tasks that hinge on solid inference foundations. While this paper focuses on dimensionality reduction and visualization, future work may extend the utility of DD-VAE to other domains, bolstered by regularized and expressive variational inference techniques.