Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A connection between Tempering and Entropic Mirror Descent (2310.11914v3)

Published 18 Oct 2023 in stat.CO, math.OC, math.ST, stat.ML, and stat.TH

Abstract: This paper explores the connections between tempering (for Sequential Monte Carlo; SMC) and entropic mirror descent to sample from a target probability distribution whose unnormalized density is known. We establish that tempering SMC corresponds to entropic mirror descent applied to the reverse Kullback-Leibler (KL) divergence and obtain convergence rates for the tempering iterates. Our result motivates the tempering iterates from an optimization point of view, showing that tempering can be seen as a descent scheme of the KL divergence with respect to the Fisher-Rao geometry, in contrast to Langevin dynamics that perform descent of the KL with respect to the Wasserstein-2 geometry. We exploit the connection between tempering and mirror descent iterates to justify common practices in SMC and derive adaptive tempering rules that improve over other alternative benchmarks in the literature.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (45)
  1. Amari, S.-i. (2016). Information Geometry and Its Applications, volume 194. Springer.
  2. Gradient flows: in metric spaces and in the space of probability measures. Springer Science & Business Media.
  3. Towards optimal scaling of Metropolis-coupled Markov chain Monte Carlo. Statistics and Computing, 21:555–568.
  4. Mirror descent with relative smoothness in measure spaces, with application to Sinkhorn and EM. Advances in Neural Information Processing Systems, 35:17263–17275.
  5. Mirror descent and nonlinear projected subgradient methods for convex optimization. Operations Research Letters, 31(3):167–175.
  6. On the stability of sequential Monte Carlo methods in high dimensions. Annals of Applied Probability, 24(4):1396–1445.
  7. All in the exponential family: Bregman duality in thermodynamic variational inference. In III, H. D. and Singh, A., editors, Proceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, pages 1111–1122. PMLR.
  8. Multivariate kernel smoothing and its applications. Chapman and Hall/CRC.
  9. Chizat, L. (2022). Convergence Rates of Gradient Methods for Convex Optimization in the Space of Measures. Open Journal of Mathematical Optimization, 3.
  10. An introduction to sequential Monte Carlo. Springer.
  11. A survey of convergence results on particle filtering methods for practitioners. IEEE Transactions on Signal Processing, 50(3):736–746.
  12. Provable Bayesian inference via particle mirror descent. In Artificial Intelligence and Statistics, pages 985–994. PMLR.
  13. An invitation to sequential Monte Carlo samplers. Journal of the American Statistical Association, 117(539):1587–1600.
  14. Waste-free sequential Monte Carlo. Journal of the Royal Statistical Society Series B: Statistical Methodology, 84(1):114–148.
  15. Del Moral, P. (2004). Feynman-Kac formulae: genealogical and interacting particle systems with applications. Probability and Its Applications. Springer Verlag, New York.
  16. Sequential Monte Carlo samplers. Journal of the Royal Statistical Society Series B: Statistical Methodology, 68(3):411–436.
  17. An Explicit Expansion of the Kullback-Leibler Divergence along its Fisher-Rao Gradient Flow. arXiv preprint arXiv:2302.12229.
  18. Analysis of Langevin Monte Carlo via convex optimization. The Journal of Machine Learning Research, 20(1):2666–2711.
  19. Interacting Langevin diffusions: Gradient structure and ensemble Kalman sampler. SIAM Journal on Applied Dynamical Systems, 19(1):412–441.
  20. Simulating normalizing constants: From importance sampling to bridge sampling to path sampling. Statistical Science, pages 163–185.
  21. Geyer, C. J. (1991). Markov chain Monte Carlo maximum likelihood. In Keramides, E. M., editor, Computing Science and Statistics: Proceedings of the 23rd Symposium on the Interface, pages 156–163.
  22. Adaptive annealed importance sampling with constant rate progress. In Krause, A., Brunskill, E., Cho, K., Engelhardt, B., Sabato, S., and Scarlett, J., editors, Proceedings of the 40th International Conference on Machine Learning, volume 202 of Proceedings of Machine Learning Research, pages 11642–11658. PMLR.
  23. Annealing between distributions by averaging moments. In Burges, C., Bottou, L., Welling, M., Ghahramani, Z., and Weinberger, K., editors, Advances in Neural Information Processing Systems, volume 26. Curran Associates, Inc.
  24. Exchange Monte Carlo method and application to spin glass simulations. Journal of the Physical Society of Japan, 65(6):1604–1608.
  25. Inference for Lévy-driven stochastic volatility models via adaptive sequential Monte Carlo. Scandinavian Journal of Statistics, 38(1):1–22.
  26. The variational formulation of the Fokker–Planck equation. SIAM Journal on Mathematical Analysis, 29(1):1–17.
  27. Linear convergence of gradient and proximal-gradient methods under the Polyak-Łojasiewicz condition. In Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2016, Riva del Garda, Italy, September 19-23, 2016, Proceedings, Part I 16, pages 795–811. Springer.
  28. Kiwaki, T. (2015). Variational optimization of annealing schedules. arXiv preprint arXiv:1502.05313.
  29. Adaptive importance sampling meets mirror descent: a bias-variance tradeoff. In International Conference on Artificial Intelligence and Statistics, pages 11503–11527. PMLR.
  30. Liu, Q. (2017). Stein variational gradient descent as gradient flow. In Guyon, I., Luxburg, U. V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., and Garnett, R., editors, Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc.
  31. Relatively smooth convex optimization by first-order methods, and applications. SIAM Journal on Optimization, 28(1):333–354.
  32. Birth–death dynamics for sampling: global convergence, approximations and their asymptotics. Nonlinearity, 36(11):5731.
  33. High-order Langevin diffusion yields an accelerated MCMC algorithm. The Journal of Machine Learning Research, 22(1):1919–1959.
  34. Neal, R. M. (2001). Annealed importance sampling. Statistics and Computing, 11:125–139.
  35. Generalization of an inequality by Talagrand and links with the logarithmic Sobolev inequality. Journal of Functional Analysis, 173(2):361–400.
  36. The incomplete beta function law for parallel tempering sampling of classical canonical systems. The Journal of Chemical Physics, 120(9):4119–4128.
  37. Rényi, A. et al. (1961). On measures of entropy and information. In Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Contributions to the Theory of Statistics. The Regents of the University of California.
  38. Exponential convergence of Langevin distributions and their discrete approximations. Bernoulli, pages 341–363.
  39. The Wasserstein proximal gradient algorithm. Advances in Neural Information Processing Systems, 33:12356–12366.
  40. Silverman, B. W. (1986). Density Estimation for Statistics and Data Analysis. Chapman & Hall.
  41. Non-reversible parallel tempering: a scalable highly parallel MCMC scheme. Journal of the Royal Statistical Society Series B: Statistical Methodology, 84(2):321–350.
  42. Parallel tempering on optimized paths. In International Conference on Machine Learning, pages 10033–10042. PMLR.
  43. Rapid convergence of the unadjusted Langevin algorithm: Isoperimetry suffices. Advances in neural information processing systems, 32.
  44. Wibisono, A. (2018). Sampling as optimization in the space of measures: The Langevin dynamics as a composite optimization problem. In Conference on Learning Theory, pages 2093–3027. PMLR.
  45. Ying, L. (2020). Mirror descent algorithms for minimizing interacting free energy. Journal of Scientific Computing, 84(3):51.
Citations (10)

Summary

We haven't generated a summary for this paper yet.