Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
129 tokens/sec
GPT-4o
28 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Forward $χ^2$ Divergence Based Variational Importance Sampling (2311.02516v2)

Published 4 Nov 2023 in cs.LG, stat.CO, and stat.ML

Abstract: Maximizing the log-likelihood is a crucial aspect of learning latent variable models, and variational inference (VI) stands as the commonly adopted method. However, VI can encounter challenges in achieving a high log-likelihood when dealing with complicated posterior distributions. In response to this limitation, we introduce a novel variational importance sampling (VIS) approach that directly estimates and maximizes the log-likelihood. VIS leverages the optimal proposal distribution, achieved by minimizing the forward $\chi2$ divergence, to enhance log-likelihood estimation. We apply VIS to various popular latent variable models, including mixture models, variational auto-encoders, and partially observable generalized linear models. Results demonstrate that our approach consistently outperforms state-of-the-art baselines, both in terms of log-likelihood and model parameter estimation.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (29)
  1. Convergence rates for optimised adaptive importance samplers. Statistics and Computing, 31:1–17, 2021.
  2. Variational inference: A review for statisticians. Journal of the American statistical Association, 112(518):859–877, 2017.
  3. Importance weighted autoencoders. arXiv preprint arXiv:1509.00519, 2015.
  4. Variational inference via χ𝜒\chiitalic_χ upper bound minimization. Advances in Neural Information Processing Systems, 30, 2017.
  5. Importance weighting and variational inference. Advances in neural information processing systems, 31, 2018.
  6. On importance-weighted autoencoders. arXiv preprint arXiv:1907.10477, 2019.
  7. Statistics. w. w, 1998.
  8. On the difficulty of unbiased alpha divergence minimization. arXiv preprint arXiv:2010.09541, 2020.
  9. Black-box alpha divergence minimization. In International conference on machine learning, pp.  1511–1520. PMLR, 2016.
  10. Variational refinement for importance sampling using the forward kullback-leibler divergence. In Uncertainty in Artificial Intelligence, pp.  1819–1829. PMLR, 2021.
  11. Hiroshi Kajino. A differentiable point process with its application to spiking neural networks. In International Conference on Machine Learning, pp.  5226–5235. PMLR, 2021.
  12. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
  13. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114, 2013.
  14. Bayesian estimates of equation system parameters: an application of integration by monte carlo. Econometrica: Journal of the Econometric Society, pp.  1–19, 1978.
  15. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.
  16. Variational inference with rényi divergence. Statistics, 1050, 2016.
  17. On relations between the relative entropy and χ𝜒\chiitalic_χ 2-divergence, generalizations and applications. Entropy, 22(5):563, 2020.
  18. Gary W Oehlert. A note on the delta method. The American Statistician, 46(1):27–29, 1992.
  19. Neural characterization in partially observed populations of spiking neurons. Advances in Neural Information Processing Systems, 20, 2007.
  20. Fully bayesian inference for neural models with negative-binomial spiking. Advances in neural information processing systems, 25, 2012.
  21. Spatio-temporal correlations and visual signalling in a complete neuronal population. Nature, 454(7207):995–999, 2008.
  22. Challenges in computing and optimizing upper bounds of marginal likelihood based on chi-square divergences. In Second Symposium on Advances in Approximate Bayesian Inference, 2019.
  23. Stochastic variational learning in recurrent spiking networks. Frontiers in computational neuroscience, 8(ARTICLE):38, 2014.
  24. Ram Naresh Saraswat. Chi square divergence measure and their bounds. In 3rd International Conference on “Innovative Approach in Applied Physical, Mathematical/Statistical”, Chemical Sciences and Emerging Energy Technology for Sustainable Development, pp.  55, 2014.
  25. f𝑓fitalic_f-divergence inequalities. IEEE Transactions on Information Theory, 62(11):5973–6006, 2016.
  26. Gradient estimation using stochastic computation graphs. Advances in neural information processing systems, 28, 2015.
  27. Bounding evidence and estimating log-likelihood in vae. arXiv preprint arXiv:2206.09453, 2022.
  28. Variational approximation for importance sampling. Computational Statistics, 36(3):1901–1930, 2021.
  29. Yes, but did it work?: Evaluating variational inference. In International Conference on Machine Learning, pp.  5581–5590. PMLR, 2018.
Citations (2)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets