Papers
Topics
Authors
Recent
2000 character limit reached

Constrained Sampling with Primal-Dual Langevin Monte Carlo (2411.00568v2)

Published 1 Nov 2024 in stat.ML, cs.LG, and math.OC

Abstract: This work considers the problem of sampling from a probability distribution known up to a normalization constant while satisfying a set of statistical constraints specified by the expected values of general nonlinear functions. This problem finds applications in, e.g., Bayesian inference, where it can constrain moments to evaluate counterfactual scenarios or enforce desiderata such as prediction fairness. Methods developed to handle support constraints, such as those based on mirror maps, barriers, and penalties, are not suited for this task. This work therefore relies on gradient descent-ascent dynamics in Wasserstein space to put forward a discrete-time primal-dual Langevin Monte Carlo algorithm (PD-LMC) that simultaneously constrains the target distribution and samples from it. We analyze the convergence of PD-LMC under standard assumptions on the target distribution and constraints, namely (strong) convexity and log-Sobolev inequalities. To do so, we bring classical optimization arguments for saddle-point algorithms to the geometry of Wasserstein space. We illustrate the relevance and effectiveness of PD-LMC in several applications.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (64)
  1. M. F. Faulkner and S. Livingstone, “Sampling algorithms in statistical physics: a guide for statistics and machine learning,” arXiv preprint arXiv:2208.04751, 2022.
  2. R. van de Schoot, S. Depaoli, R. King, B. Kramer, K. Märtens, M. G. Tadesse, M. Vannucci, A. Gelman, D. Veen, J. Willemsen et al., “Bayesian statistics and modelling,” Nature Reviews Methods Primers, vol. 1, no. 1, p. 1, 2021.
  3. Y. Song and S. Ermon, “Generative modeling by estimating gradients of the data distribution,” Advances in neural information processing systems, vol. 32, 2019.
  4. G. O. Roberts and J. S. Rosenthal, “General state space Markov chains and MCMC algorithms,” Probability Surveys, vol. 1, 2004.
  5. G. O. Roberts and R. L. Tweedie, “Exponential convergence of Langevin distributions and their discrete approximations,” Bernoulli, pp. 341–363, 1996.
  6. A. Wibisono, “Sampling as optimization in the space of measures: The Langevin dynamics as a composite optimization problem,” in Conference on Learning Theory.   PMLR, 2018, pp. 2093–3027.
  7. A. Durmus, S. Majewski, and B. Miasojedow, “Analysis of Langevin Monte Carlo via convex optimization,” The Journal of Machine Learning Research, vol. 20, no. 1, pp. 2666–2711, 2019.
  8. Y. Wang and W. Li, “Accelerated information gradient flow,” J. Sci. Comput., vol. 90, no. 1, 2022.
  9. L. Lang, W.-s. Chen, B. R. Bakshi, P. K. Goel, and S. Ungarala, “Bayesian estimation via sequential Monte Carlo sampling—Constrained dynamic systems,” Automatica, vol. 43, no. 9, pp. 1615–1622, 2007.
  10. Y. Li and S. K. Ghosh, “Efficient sampling methods for truncated multivariate normal and student-t distributions subject to linear inequality constraints,” Journal of Statistical Theory and Practice, vol. 9, pp. 712–732, 2015.
  11. Y.-P. Hsieh, A. Kavis, P. Rolland, and V. Cevher, “Mirrored Langevin dynamics,” Advances in Neural Information Processing Systems, vol. 31, 2018.
  12. S. Bubeck, R. Eldan, and J. Lehec, “Sampling from a log-concave distribution with projected Langevin Monte Carlo,” Discrete & Computational Geometry, vol. 59, no. 4, pp. 757–783, 2018.
  13. A. Salim and P. Richtarik, “Primal dual interpretation of the proximal stochastic gradient Langevin algorithm,” in Advances in Neural Information Processing Systems, 2020, pp. 3786–3796.
  14. K. Ahn and S. Chewi, “Efficient constrained sampling via the mirror-Langevin algorithm,” Advances in Neural Information Processing Systems, vol. 34, pp. 28 405–28 418, 2021.
  15. Y. Kook, Y. Lee, R. Shen, and S. Vempala, “Sampling with Riemannian Hamiltonian Monte Carlo in a constrained space,” in Advances in Neural Information Processing Systems, A. H. Oh, A. Agarwal, D. Belgrave, and K. Cho, Eds., 2022.
  16. L. Sharrock, L. Mackey, and C. Nemeth, “Learning rate free Bayesian inference in constrained domains,” in Conference on Neural Information Processing Systems, 2023.
  17. M. Noble, V. De Bortoli, and A. Durmus, “Unbiased constrained sampling with self-concordant barrier Hamiltonian Monte Carlo,” in Thirty-seventh Conference on Neural Information Processing Systems, 2023.
  18. A. Mądry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu, “Towards deep learning models resistant to adversarial attacks,” in International Conference on Learning Representations, 2018.
  19. M. Kearns, S. Neel, A. Roth, and Z. S. Wu, “Preventing fairness Gerrymandering: Auditing and learning for subgroup fairness,” in International Conference on Machine Learning, 2018, pp. 2564–2572.
  20. A. Cotter, H. Jiang, M. Gupta, S. Wang, T. Narayan, S. You, and K. Sridharan, “Optimization with non-differentiable constraints with applications to fairness, recall, churn, and other goals,” Journal of Machine Learning Research, vol. 20, no. 172, pp. 1–59, 2019.
  21. L. F. O. Chamon, S. Paternain, M. Calvo-Fullana, and A. Ribeiro, “Constrained learning with non-convex losses,” IEEE Trans. on Inf. Theory, vol. 69[3], pp. 1739–1760, 2023.
  22. M. Gürbüzbalaban, Y. Hu, and L. Zhu, “Penalized Langevin and Hamiltonian Monte Carlo Algorithms for Constrained Sampling,” 2022.
  23. X. Liu, X. Tong, and Q. Liu, “Sampling with trusthworthy constraints: A variational gradient framework,” Advances in Neural Information Processing Systems, vol. 34, pp. 23 557–23 568, 2021.
  24. R. Jordan, D. Kinderlehrer, and F. Otto, “The variational formulation of the Fokker–Planck equation,” SIAM Journal on Mathematical Analysis, vol. 29, no. 1, pp. 1–17, 1998.
  25. A. S. Dalalyan and A. Karagulyan, “User-friendly guarantees for the Langevin Monte Carlo with inaccurate gradient,” Stochastic Processes and their Applications, vol. 129, no. 12, pp. 5278–5311, 2019.
  26. A. S. Dalalyan, A. Karagulyan, and L. Riou-Durand, “Bounding the error of discretized Langevin algorithms for non-strongly log-concave targets,” Journal of Machine Learning Research, vol. 23, no. 235, pp. 1–38, 2022.
  27. S. Vempala and A. Wibisono, “Rapid convergence of the unadjusted Langevin algorithm: Isoperimetry suffices,” in Advances in Neural Information Processing Systems (NeurIPS), 2019, pp. 8092–8104.
  28. V. Jeyakumar and H. Wolkowicz, “Generalizations of slater’s constraint qualification for infinite convex programs,” Math. Program., vol. 57, no. 1–3, pp. 85–101, 1992.
  29. B. Kloeckner, “Approximation by finitely supported measures,” ESAIM: Control, Optimisation and Calculus of Variations, vol. 18, no. 2, pp. 343–359, 2012.
  30. M. Welling and Y. W. Teh, “Bayesian learning via stochastic gradient Langevin dynamics,” in Proceedings of the 28th International Conference on International Conference on Machine Learning, ser. ICML’11.   Madison, WI, USA: Omnipress, 2011, p. 681–688.
  31. A. Salim, A. Korba, and G. Luise, “The Wasserstein proximal gradient algorithm,” Advances in Neural Information Processing Systems, vol. 33, pp. 12 356–12 366, 2020.
  32. A. Nedić and A. Ozdaglar, “Approximate primal solutions and rate analysis for dual subgradient methods,” SIAM Journal on Optimization, vol. 19, no. 4, pp. 1757–1780, 2009.
  33. A. Cherukuri, E. Mallada, and J. Cortés, “Asymptotic convergence of constrained primal–dual dynamics,” Systems & Control Letters, vol. 87, pp. 10–15, 2016.
  34. A. Nemirovski, “Prox-method with rate of convergence o(1/t) for variational inequalities with Lipschitz continuous monotone operators and smooth convex-concave saddle point problems,” SIAM Journal on Optimization, vol. 15, no. 1, pp. 229–251, 2004.
  35. Y. Nesterov, “Dual extrapolation and its applications to solving variational inequalities and related problems,” Math. Program., vol. 109, no. 2–3, p. 319–344, 2007.
  36. T. Lin, C. Jin, and M. I. Jordan, “Near-optimal algorithms for minimax optimization,” in Proceedings of Thirty Third Conference on Learning Theory, 2020, pp. 2738–2779.
  37. A. Mokhtari, A. Ozdaglar, and S. Pattathil, “A unified analysis of extra-gradient and optimistic gradient methods for saddle point problems: Proximal point approach,” in Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, 2020, pp. 1497–1507.
  38. S. Chewi, M. A. Erdogdu, M. Li, R. Shen, and S. Zhang, “Analysis of Langevin Monte Carlo: from Poincaré to Log-Sobolev,” in Proceedings of Thirty Fifth Conference on Learning Theory, ser. Proceedings of Machine Learning Research, P.-L. Loh and M. Raginsky, Eds., vol. 178.   PMLR, 2022, pp. 1–2.
  39. R. Holley and D. W. Stroock, “Logarithmic Sobolev inequalities and stochastic Ising models,” Journal of Statistical Physics, no. 5–6, 1986.
  40. P. Cattiaux and A. Guillin, “Functional inequalities for perturbed measures with applications to log-concave measures and to some Bayesian problems,” Bernoulli, vol. 28, no. 4, 2022.
  41. H. Karimi, J. Nutini, and M. Schmidt, “Linear convergence of gradient and proximal-gradient methods under the Polyak-Łojasiewicz condition,” in Machine Learning and Knowledge Discovery in Databases, P. Frasconi, N. Landwehr, G. Manco, and J. Vreeken, Eds.   Cham: Springer International Publishing, 2016, pp. 795–811.
  42. J. Yang, X. Li, and N. He, “Nest your adaptive algorithm for parameter-agnostic nonconvex minimax optimization,” in Advances in Neural Information Processing Systems, 2022.
  43. M. Boroun, Z. Alizadeh, and A. Jalilzadeh, “Accelerated primal-dual scheme for a class of stochastic nonconvex-concave saddle point problems,” in American Control Conference, 2023, pp. 204–209.
  44. M. Sanjabi, J. Ba, M. Razaviyayn, and J. D. Lee, “On the convergence and robustness of training gans with regularized optimal transport,” in Advances in Neural Information Processing Systems, 2018.
  45. J. Yang, N. Kiyavash, and N. He, “Global convergence and variance reduction for a class of nonconvex-nonconcave minimax problems,” in Advances in Neural Information Processing Systems, 2020, pp. 1153–1165.
  46. T. Fiez, L. Ratliff, E. Mazumdar, E. Faulkner, and A. Narang, “Global convergence to local minmax equilibrium in classes of nonconvex zero-sum games,” in Advances in Neural Information Processing Systems, 2021, pp. 29 049–29 063.
  47. K. Ahn and S. Chewi, “Efficient constrained sampling via the mirror-Langevin algorithm,” in Advances in Neural Information Processing Systems, vol. 34, 2021, p. 26.
  48. D. Dua and C. Graff, “UCI machine learning repository,” 2017. [Online]. Available: http://archive.ics.uci.edu/ml
  49. L. F. O. Chamon and A. Ribeiro, “Probably approximately correct constrained learning,” in Advances in Neural Information Processing, 2020.
  50. D. M. Blei, A. Y. Ng, and M. I. Jordan, “Latent Dirichlet allocation,” J. Mach. Learn. Res., vol. 3, pp. 993–1022, 2003.
  51. G. Celeux, M. E. Anbari, J.-M. Marin, and C. P. Robert, “Regularization in Regression: Comparing Bayesian and Frequentist Methods in a Poorly Informative Situation,” Bayesian Analysis, vol. 7, no. 2, pp. 477–502, 2012.
  52. A. Lamperski, “Projected stochastic gradient Langevin algorithms for constrained sampling and non-convex learning,” in Conference on Learning Theory.   PMLR, 2021, pp. 2891–2937.
  53. L. Li, Q. Liu, A. Korba, M. Yurochkin, and J. Solomon, “Sampling with mollified interaction energy descent,” arXiv preprint arXiv:2210.13400, 2022.
  54. K. S. Zhang, G. Peyré, J. Fadili, and M. Pereyra, “Wasserstein control of mirror Langevin Monte Carlo,” in Conference on Learning Theory.   PMLR, 2020, pp. 3814–3841.
  55. Q. Jiang, “Mirror Langevin Monte Carlo: the case under isoperimetry,” Advances in Neural Information Processing Systems, vol. 34, pp. 715–725, 2021.
  56. V. Srinivasan, A. Wibisono, and A. Wilson, “Fast sampling from constrained spaces using the metropolis-adjusted mirror Langevin algorithm,” arXiv preprint arXiv:2312.08823, 2023.
  57. J. Shi, C. Liu, and L. Mackey, “Sampling with mirrored Stein operators,” International Conference of Learning Representations, 2022.
  58. M. Girolami and B. Calderhead, “Riemann manifold Langevin and Hamiltonian Monte Carlo methods,” Journal of the Royal Statistical Society Series B: Statistical Methodology, vol. 73, no. 2, pp. 123–214, 2011.
  59. M. Brubaker, M. Salzmann, and R. Urtasun, “A family of MCMC methods on implicitly defined manifolds,” in Artificial intelligence and statistics.   PMLR, 2012, pp. 161–172.
  60. P. Tseng, “On linear convergence of iterative methods for the variational inequality problem,” Journal of Computational and Applied Mathematics, vol. 60, no. 1, pp. 237–252, 1995.
  61. N. Golowich, S. Pattathil, C. Daskalakis, and A. Ozdaglar, “Last iterate is slower than averaged iterate in smooth convex-concave saddle point problems,” in Proceedings of Thirty Third Conference on Learning Theory, 2020, pp. 1758–1784.
  62. F. Otto, “The geometry of dissipative evolution equations: the porous medium equation,” Communications in Partial Differential Equations, vol. 26, no. 1-2, pp. 101–174, 2001.
  63. A. T. Schwarm and M. Nikolaou, “Chance-constrained model predictive control,” AIChE Journal, vol. 45[8], pp. 1743–1752, 1999.
  64. P. Li, M. Wendt, and G. Wozny, “Robust model predictive control under chance constraints,” Computers & Chemical Engineering, vol. 24[2-7], pp. 829–834, 2000.

Summary

  • The paper introduces the PD-LMC algorithm that couples Langevin dynamics with primal-dual optimization to enforce statistical constraints in sampling.
  • It rigorously proves convergence under conditions like strong convexity and log-Sobolev inequalities within Wasserstein space.
  • Experiments validate the method’s effectiveness in applications such as fairness in predictive modeling and counterfactual Bayesian inference.

Constrained Sampling with Primal-Dual Langevin Monte Carlo: An Overview

The paper introduces a novel approach to the problem of constrained sampling from probability distributions where the distribution is known up to a normalization constant and subject to statistical constraints. These constraints are specified via expected values of nonlinear functions, which are common in Bayesian inference scenarios for tasks such as incorporating fairness constraints or evaluating counterfactual scenarios. This approach is crucial because conventional MCMC methods lack natural mechanisms to enforce such statistical constraints.

Core Contributions

The authors present a new discrete-time algorithm, Primal-Dual Langevin Monte Carlo (PD-LMC), which involves coupling Langevin dynamics adapted for Wasserstein space with saddle-point optimization methods. This dual-ascent methodology employs primal-dual algorithms that maintain constraint satisfaction while conducting sampling. Notably, PD-LMC overcomes the support constraints by modifying the sampling process to not only adhere to the desired target distribution's properties but also integrate constraints seamlessly.

Key contributions of this paper include:

  1. Algorithm Development:
    • Introduction of the PD-LMC that intertwines constrained optimization techniques within the Langevin Monte Carlo framework.
    • Implementation of simultaneous sampling and constraint adherence using gradient descent-ascent dynamics within Wasserstein space, thus addressing challenges of conventional methods in handling statistical constraints.
  2. Theoretical Analysis:
    • The paper rigorously demonstrates the convergence of the proposed algorithm under standard assumptions, particularly focusing on strong convexity and log-Sobolev inequalities of the target distribution.
    • Expansion of classical optimization arguments to the geometry of Wasserstein space to support the convergence analysis of the PD-LMC algorithm.
  3. Practical Implications and Experiments:
    • Validation of the PD-LMC algorithm's effectiveness through multiple applications, demonstrating its practical potential in scenarios such as ensuring fairness in predictive models and exploring counterfactual scenarios within Bayesian frameworks.

Mathematical Insights

The proposed PD-LMC method serves as a sampling counterpart of gradient descent-ascent methods. The transformation of the constrained sampling problem into a dual problem offers substantial computational advantages. The characterization of solutions using Lagrange multipliers further ties optimization techniques with sampling strategies by exploiting dual problem properties in probabilistic space.

Performance & Implications

Numerically, the paper claims strong empirical results that underscore PD-LMC’s efficacy and flexibility in tackling complex constrained sampling problems. The methodology could advance AI applications by providing more robust sampling techniques that are preparatory for applications in fairness, robust model assessments, and potentially new realms where constraints need enforcement within probabilistic models.

Future Directions

Extending the analysis to almost sure convergence results and incorporating accelerated methods or proximal algorithms could enhance the PD-LMC. This development would be particularly crucial in addressing high-dimensional scenarios or more intricate statistical constraints. Further exploring these prospects might unlock novel applications or theoretical insights, thus broadening constrained sampling's utility in AI research.

The paper offers a competent blend of algorithmic innovation and theoretical backbone, paving the way for enhanced structured sampling methods aligned with statistical constraints. Given its highly pertinent contributions, it signifies a substantial step forward in probabilistic modeling and AI research domains focusing on principled constraint handling.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 3 tweets and received 91 likes.

Upgrade to Pro to view all of the tweets about this paper: