Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash 88 tok/s
Gemini 2.5 Pro 47 tok/s Pro
GPT-5 Medium 33 tok/s
GPT-5 High 38 tok/s Pro
GPT-4o 85 tok/s
GPT OSS 120B 468 tok/s Pro
Kimi K2 203 tok/s Pro
2000 character limit reached

Algorithms for mean-field variational inference via polyhedral optimization in the Wasserstein space (2312.02849v4)

Published 5 Dec 2023 in math.ST, cs.LG, math.OC, and stat.TH

Abstract: We develop a theory of finite-dimensional polyhedral subsets over the Wasserstein space and optimization of functionals over them via first-order methods. Our main application is to the problem of mean-field variational inference, which seeks to approximate a distribution $\pi$ over $\mathbb{R}d$ by a product measure $\pi\star$. When $\pi$ is strongly log-concave and log-smooth, we provide (1) approximation rates certifying that $\pi\star$ is close to the minimizer $\pi\star_\diamond$ of the KL divergence over a \emph{polyhedral} set $\mathcal{P}\diamond$, and (2) an algorithm for minimizing $\text{KL}(\cdot|\pi)$ over $\mathcal{P}\diamond$ based on accelerated gradient descent over $\Rd$. As a byproduct of our analysis, we obtain the first end-to-end analysis for gradient-based algorithms for MFVI.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (63)
  1. Multimarginal generative modeling with stochastic interpolants. arXiv preprint arXiv:2310.03695.
  2. Averaging on the Bures–Wasserstein manifold: dimension-free convergence of gradient descent. In Ranzato, M., Beygelzimer, A., Nguyen, K., Liang, P. S., Vaughan, J. W., and Dauphin, Y., editors, Advances in Neural Information Processing Systems, volume 34, pages 22132–22145. Curran Associates, Inc.
  3. Bayesian learning with Wasserstein barycenters. ESAIM Probab. Stat., 26:436–472.
  4. Beck, A. (2017). First-order methods in optimization. SIAM.
  5. On the convergence of coordinate ascent variational inference. arXiv preprint arXiv:2306.01122.
  6. Geodesic PCA in the Wasserstein space by convex PCA. Ann. Inst. Henri Poincaré Probab. Stat., 53(1):1–26.
  7. Variational inference: a review for statisticians. Journal of the American Statistical Association, 112(518):859–877.
  8. Distribution’s template estimate with Wasserstein metrics. Bernoulli, 21(2):740–759.
  9. Wasserstein barycentric coordinates: histogram regression using optimal transport. ACM Trans. Graph., 35(4):71–1.
  10. On extensions of the Brunn–Minkowski and Prékopa–Leindler theorems, including inequalities for log concave functions, and with an application to the diffusion equation. J. Functional Analysis, 22(4):366–389.
  11. Brenier, Y. (1991). Polar factorization and monotone rearrangement of vector-valued functions. Comm. Pure Appl. Math., 44(4):375–417.
  12. Bubeck, S. (2015). Convex optimization: algorithms and complexity. Foundations and Trends® in Machine Learning, 8(3-4):231–357.
  13. Caffarelli, L. A. (2000). Monotonicity properties of optimal transportation and the FKG and related inequalities. Communications in Mathematical Physics, 214(3):547–563.
  14. Linearized optimal transport for collider events. Physical Review D, 102(11):116019.
  15. Geodesic PCA versus log-PCA of histograms in the Wasserstein space. SIAM J. Sci. Comput., 40(2):B429–B456.
  16. Neural ordinary differential equations. Advances in Neural Information Processing Systems, 31.
  17. Chewi, S. (2023). Log-concave sampling. Book draft available at https://chewisinho.github.io.
  18. Fast and smooth interpolation on Wasserstein space. In Banerjee, A. and Fukumizu, K., editors, Proceedings of the 24th International Conference on Artificial Intelligence and Statistics, volume 130 of Proceedings of Machine Learning Research, pages 3061–3069. PMLR.
  19. Gradient descent algorithms for Bures–Wasserstein barycenters. In Abernethy, J. and Agarwal, S., editors, Proceedings of Thirty Third Conference on Learning Theory, volume 125 of Proceedings of Machine Learning Research, pages 1276–1304. PMLR.
  20. An entropic generalization of Caffarelli’s contraction theorem via covariance inequalities. Reports. Mathematical, 361:1471–1482.
  21. Cuturi, M. (2013). Sinkhorn distances: lightspeed computation of optimal transport. Advances in Neural Information Processing Systems, 26.
  22. Fast computation of Wasserstein barycenters. In International Conference on Machine Learning, pages 685–693. PMLR.
  23. Bounding the error of discretized Langevin algorithms for non-strongly log-concave targets. J. Mach. Learn. Res., 23:Paper No. 235, 38.
  24. Rates of estimation of optimal transport maps using plug-in estimators via barycentric projections. Advances in Neural Information Processing Systems, 34:29736–29753.
  25. Forward-backward Gaussian variational inference via JKO in the Bures–Wasserstein space. In International Conference on Machine Learning, pages 7960–7991. PMLR.
  26. Optimal transport map estimation in general function spaces. arXiv preprint arXiv:2212.03722.
  27. Domke, J. (2020). Provable smoothness guarantees for black-box variational inference. In International Conference on Machine Learning, pages 2587–2596. PMLR.
  28. Provable convergence guarantees for black-box variational inference. arXiv preprint arXiv:2306.03638.
  29. Learning normalizing flows from entropy-Kantorovich potentials. arXiv preprint arXiv:2006.06033.
  30. How to train your neural ODE: the world of Jacobian and kinetic regularization. In International Conference on Machine Learning, pages 3154–3164. PMLR.
  31. An algorithm for quadratic programming. Naval Research Logistics Quarterly, 3(1-2):95–110.
  32. Tangential Wasserstein projections. arXiv preprint arXiv:2207.14727.
  33. Fundamentals of convex analysis. Springer Science & Business Media.
  34. Convex potential flows: universal probability distributions with optimal transport and convex optimization. In International Conference on Learning Representations.
  35. Minimax estimation of smooth optimal transport maps. Ann. Statist., 49(2):1166–1194.
  36. Jaggi, M. (2013). Revisiting Frank–Wolfe: projection-free sparse convex optimization. In International Conference on Machine Learning, pages 427–435. PMLR.
  37. The variational formulation of the Fokker–Planck equation. SIAM J. Math. Anal., 29(1):1–17.
  38. Frank–Wolfe methods in probability space. arXiv preprint arXiv:2105.05352.
  39. On the convergence of black-box variational inference. arXiv preprint arXiv:2305.15349.
  40. Wasserstein distributionally robust optimization: theory and applications in machine learning. In Operations Research & Management Science in the Age of Analytics, pages 130–166. Informs.
  41. Lacker, D. (2023). Independent projections of diffusions: gradient flows for variational inference and optimal mean field approximations. arXiv preprint arXiv:2309.13332.
  42. Variational inference via Wasserstein gradient flows. Advances in Neural Information Processing Systems, 35:14434–14447.
  43. Plugin estimation of smooth optimal transport maps. arXiv preprint arXiv:2107.12364.
  44. Problem complexity and method efficiency in optimization. Wiley-Interscience Series in Discrete Mathematics. John Wiley & Sons, Inc., New York. Translated from the Russian and with a preface by E. R. Dawson.
  45. Nesterov, Y. E. (1983). A method for solving the convex programming problem with convergence rate O⁢(1/k2)𝑂1superscript𝑘2O(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Dokl. Akad. Nauk SSSR, 269(3):543–547.
  46. Otto, F. (2001). The geometry of dissipative evolution equations: the porous medium equation. Comm. Partial Differential Equations, 26(1-2):101–174.
  47. Amplitude and phase variation of point processes. Ann. Statist., 44(2):771–812.
  48. An invitation to statistics in Wasserstein space. Springer Nature.
  49. Computational optimal transport. Foundations and Trends® in Machine Learning, 11(5-6):355–607.
  50. Entropic estimation of optimal transport maps. arXiv preprint arXiv:2109.12004.
  51. Rockafellar, R. T. (1997). Convex analysis. Princeton Landmarks in Mathematics. Princeton University Press, Princeton, NJ. Reprint of the 1970 original, Princeton Paperbacks.
  52. Santambrogio, F. (2015). Optimal transport for applied mathematicians. Birkäuser, NY, 55(58-63):94.
  53. Particle mean field variational Bayes. arXiv preprint arXiv:2303.13930.
  54. Villani, C. (2009). Optimal transport: old and new, volume 338. Springer.
  55. Villani, C. (2021). Topics in optimal transportation, volume 58. American Mathematical Soc.
  56. Graphical models, exponential families, and variational inference. Foundations and Trends® in Machine Learning, 1(1–2):1–305.
  57. Measure estimation in the barycentric coding model. In International Conference on Machine Learning, pages 23781–23803. PMLR.
  58. Wibisono, A. (2018). Sampling as optimization in the space of measures: the Langevin dynamics as a composite optimization problem. In Proceedings of the 31st Conference on Learning Theory, volume 75 of Proceedings of Machine Learning Research, pages 2093–3027. PMLR.
  59. Mean field variational inference via Wasserstein gradient flow. arXiv preprint arXiv:2207.08074.
  60. Bridging the gap between variational inference and Wasserstein gradient flows. arXiv preprint arXiv:2310.20090.
  61. On linear optimization over Wasserstein balls. Mathematical Programming, 195(1-2):1107–1122.
  62. Fréchet means and Procrustes analysis in Wasserstein space. Bernoulli, 25(2):932–976.
  63. Algorithm 778: L-BFGS-B: Fortran subroutines for large-scale bound-constrained optimization. ACM Transactions on Mathematical Software (TOMS), 23(4):550–560.
Citations (6)
List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Github Logo Streamline Icon: https://streamlinehq.com
X Twitter Logo Streamline Icon: https://streamlinehq.com

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube