Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 175 tok/s
Gemini 2.5 Pro 49 tok/s Pro
GPT-5 Medium 27 tok/s Pro
GPT-5 High 28 tok/s Pro
GPT-4o 67 tok/s Pro
Kimi K2 179 tok/s Pro
GPT OSS 120B 442 tok/s Pro
Claude Sonnet 4.5 35 tok/s Pro
2000 character limit reached

Doubly Adaptive Importance Sampling (2404.18556v2)

Published 29 Apr 2024 in stat.CO

Abstract: We propose an adaptive importance sampling scheme for Gaussian approximations of intractable posteriors. Optimization-based approximations like variational inference can be too inaccurate while existing Monte Carlo methods can be too slow. Therefore, we propose a hybrid where, at each iteration, the Monte Carlo effective sample size can be guaranteed at a fixed computational cost by interpolating between natural-gradient variational inference and importance sampling. The amount of damping in the updates adapts to the posterior and guarantees the effective sample size. Gaussianity enables the use of Stein's lemma to obtain gradient-based optimization in the highly damped variational inference regime and a reduction of Monte Carlo error for undamped adaptive importance sampling. The result is a generic, embarrassingly parallel and adaptive posterior approximation method. Numerical studies on simulated and real data show its competitiveness with other, less general methods.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (50)
  1. Amari, S. (1985). Differential-Geometrical Methods in Statistics. Lecture Notes in Statistics. New York, NY: Springer. Chapter 3.
  2. Amari, S. (1998). Natural gradient works efficiently in learning. Neural Computation 10(2), 251–276.
  3. On the convergence of adaptive sequential Monte Carlo methods. The Annals of Applied Probability 26(2), 1111–1146.
  4. Variational inference: A review for statisticians. Journal of the American Statistical Association 112(518), 859–877.
  5. JAX: composable transformations of Python+NumPy programs. http://github.com/google/jax.
  6. Adaptive importance sampling: The past, the present, and the future. IEEE Signal Processing Magazine 34(4), 60–79.
  7. Batch and match: black-box variational inference with a score-based divergence. arXiv:2402.14758v1.
  8. An Introduction to Sequential Monte Carlo. Springer Series in Statistics. Springer Nature Switzerland.
  9. Dehaene, G. P. (2016). Expectation propagation performs a smoothed gradient descent. arXiv:1612.05053v1.
  10. Sequential Monte Carlo samplers. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 68(3), 411–436.
  11. Domke, J. and D. R. Sheldon (2018). Importance weighting and variational inference. In Advances in Neural Information Processing Systems, Volume 31. Curran Associates, Inc.
  12. UCI Machine Learning Repository. University of California, Irvine, School of Information and Computer Sciences, http://archive.ics.uci.edu/ml.
  13. A gradient adaptive population importance sampler. In 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.  4075–4079. IEEE.
  14. Rethinking the effective sample size. International Statistical Review 90(3), 525–550.
  15. Fayyad, U. M. and K. B. Irani (1993). Multi-interval discretization of continuous-valued attributes for classification learning. In Proceedings of the Thirteenth International Joint Conference on Artificial Intelligence, Vol. 2, pp.  1022–1027.
  16. A weakly informative default prior distribution for logistic and other regression models. The Annals of Applied Statistics 2(4), 1360–1383.
  17. Graham, M. M. (2020). Mici: Manifold Markov chain Monte Carlo methods in Python. https://github.com/matt-graham/mici.
  18. Stein variational adaptive importance sampling. In Proceedings of the Thirty-Third Conference on Uncertainty in Artificial Intelligence, UAI 2017, Sydney, Australia, August 11-15, 2017. AUAI Press.
  19. Black-box α𝛼\alphaitalic_α-divergence minimization. In Proceedings of The 33rd International Conference on Machine Learning, Volume 48 of Proceedings of Machine Learning Research, New York, NY, pp.  1511–1520. PMLR.
  20. Variational refinement for importance sampling using the forward Kullback-Leibler divergence. In Proceedings of the Thirty-Seventh Conference on Uncertainty in Artificial Intelligence, Volume 161 of Proceedings of Machine Learning Research, pp.  1819–1829. PMLR.
  21. Variational adaptive-Newton method for explorative learning. In Advances in Approximate Bayesian Inference. NIPS 2017 Workshop.
  22. Fast yet simple natural-gradient descent for variational inference in complex models. In 2018 International Symposium on Information Theory and Its Applications (ISITA), pp.  31–35. IEEE.
  23. A population Monte Carlo scheme with transformed weights and its application to stochastic kinetic models. Statistics and Computing 25(2), 407–425.
  24. Li, Y. and R. E. Turner (2016). Rényi divergence variational inference. In Advances in Neural Information Processing Systems, Volume 29. Curran Associates, Inc.
  25. Stein’s lemma for the reparameterization trick with exponential family mixtures. arXiv:1910.13398v1.
  26. Stein variational gradient descent: A general purpose Bayesian inference algorithm. In Advances in Neural Information Processing Systems 29. Curran Associates, Inc.
  27. Martens, J. (2020). New insights and perspectives on the natural gradient method. Journal of Machine Learning Research 21, 146.
  28. Minka, T. (2004). Power EP. Technical Report MSR-TR-2004-149, Microsoft Research Ltd., Cambridge, UK.
  29. Minka, T. (2005). Divergence measures and message passing. Technical Report MSR-TR-2005-173, Microsoft Research Ltd., Cambridge, UK.
  30. Minka, T. P. (2001). Expectation propagation for approximate Bayesian inference. In Proceedings of the Seventeenth Conference on Uncertainty in Artificial Intelligence, pp.  362–369.
  31. Zero variance Markov chain Monte Carlo for Bayesian estimators. Statistics and Computing 23(5), 653–662.
  32. Variational inference with Gaussian score matching. In Advances in Neural Information Processing Systems 36. Curran Associates, Inc.
  33. Neal, R. M. (2001). Annealed importance sampling. Statistics and Computing 11(2), 125–139.
  34. Control functionals for Monte Carlo integration. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 79(3), 695–718.
  35. Gaussian variational approximation with a factor covariance structure. Journal of Computational and Graphical Statistics 27(3), 465–478.
  36. The variational Gaussian approximation revisited. Neural Computation 21(3), 786–792.
  37. Expectation consistent approximate inference. Journal of Machine Learning Research 6, 2177–2204.
  38. Owen, A. B. (2013). Monte Carlo Theory, Methods and Examples. https://artowen.su.domains/mc/.
  39. Implicitly adaptive importance sampling. Statistics and Computing 31(2), 16.
  40. Distilling importance sampling. arXiv:1910.03632v4.
  41. A contrastive divergence for combining variational inference and MCMC. In Proceedings of the 36th International Conference on Machine Learning, Volume 97 of Proceedings of Machine Learning Research, pp. 5537–5545. PMLR.
  42. Ryu, E. K. and S. P. Boyd (2015). Adaptive importance sampling via stochastic convex programming. arXiv:1412.4845v2.
  43. Stein, C. (1972). A bound for the error in the normal approximation to the distribution of a sum of dependent random variables. In Proceedings of the sixth Berkeley symposium on mathematical statistics and probability, volume 2: Probability theory, Volume 6, pp. 583–603. University of California Press.
  44. Stuart, A. M. (2010). Inverse problems: A Bayesian perspective. Acta Numerica 19, 451–559.
  45. Doubly stochastic variational Bayes for non-conjugate inference. In Proceedings of the 31st International Conference on Machine Learning, Volume 32 of Proceedings of Machine Learning Research, Bejing, China, pp.  1971–1979. PMLR.
  46. Expectation propagation as a way of life: A framework for Bayesian inference on partitioned data. Journal of Machine Learning Research 21, 17.
  47. Pareto smoothed importance sampling. arXiv:1507.02646v9.
  48. Variational inference with tail-adaptive f𝑓fitalic_f-divergence. In Advances in Neural Information Processing Systems 31. Curran Associates, Inc.
  49. Fractional belief propagation. In Advances in Neural Information Processing Systems 15. MIT Press.
  50. Yes, but did it work?: Evaluating variational inference. In Proceedings of the 35th International Conference on Machine Learning, Volume 80 of Proceedings of Machine Learning Research, pp. 5581–5590. PMLR.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We found no open problems mentioned in this paper.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 2 tweets and received 8 likes.

Upgrade to Pro to view all of the tweets about this paper: