Constrained Sampling with Primal-Dual Langevin Monte Carlo (2411.00568v2)
Abstract: This work considers the problem of sampling from a probability distribution known up to a normalization constant while satisfying a set of statistical constraints specified by the expected values of general nonlinear functions. This problem finds applications in, e.g., Bayesian inference, where it can constrain moments to evaluate counterfactual scenarios or enforce desiderata such as prediction fairness. Methods developed to handle support constraints, such as those based on mirror maps, barriers, and penalties, are not suited for this task. This work therefore relies on gradient descent-ascent dynamics in Wasserstein space to put forward a discrete-time primal-dual Langevin Monte Carlo algorithm (PD-LMC) that simultaneously constrains the target distribution and samples from it. We analyze the convergence of PD-LMC under standard assumptions on the target distribution and constraints, namely (strong) convexity and log-Sobolev inequalities. To do so, we bring classical optimization arguments for saddle-point algorithms to the geometry of Wasserstein space. We illustrate the relevance and effectiveness of PD-LMC in several applications.
- M. F. Faulkner and S. Livingstone, “Sampling algorithms in statistical physics: a guide for statistics and machine learning,” arXiv preprint arXiv:2208.04751, 2022.
- R. van de Schoot, S. Depaoli, R. King, B. Kramer, K. Märtens, M. G. Tadesse, M. Vannucci, A. Gelman, D. Veen, J. Willemsen et al., “Bayesian statistics and modelling,” Nature Reviews Methods Primers, vol. 1, no. 1, p. 1, 2021.
- Y. Song and S. Ermon, “Generative modeling by estimating gradients of the data distribution,” Advances in neural information processing systems, vol. 32, 2019.
- G. O. Roberts and J. S. Rosenthal, “General state space Markov chains and MCMC algorithms,” Probability Surveys, vol. 1, 2004.
- G. O. Roberts and R. L. Tweedie, “Exponential convergence of Langevin distributions and their discrete approximations,” Bernoulli, pp. 341–363, 1996.
- A. Wibisono, “Sampling as optimization in the space of measures: The Langevin dynamics as a composite optimization problem,” in Conference on Learning Theory. PMLR, 2018, pp. 2093–3027.
- A. Durmus, S. Majewski, and B. Miasojedow, “Analysis of Langevin Monte Carlo via convex optimization,” The Journal of Machine Learning Research, vol. 20, no. 1, pp. 2666–2711, 2019.
- Y. Wang and W. Li, “Accelerated information gradient flow,” J. Sci. Comput., vol. 90, no. 1, 2022.
- L. Lang, W.-s. Chen, B. R. Bakshi, P. K. Goel, and S. Ungarala, “Bayesian estimation via sequential Monte Carlo sampling—Constrained dynamic systems,” Automatica, vol. 43, no. 9, pp. 1615–1622, 2007.
- Y. Li and S. K. Ghosh, “Efficient sampling methods for truncated multivariate normal and student-t distributions subject to linear inequality constraints,” Journal of Statistical Theory and Practice, vol. 9, pp. 712–732, 2015.
- Y.-P. Hsieh, A. Kavis, P. Rolland, and V. Cevher, “Mirrored Langevin dynamics,” Advances in Neural Information Processing Systems, vol. 31, 2018.
- S. Bubeck, R. Eldan, and J. Lehec, “Sampling from a log-concave distribution with projected Langevin Monte Carlo,” Discrete & Computational Geometry, vol. 59, no. 4, pp. 757–783, 2018.
- A. Salim and P. Richtarik, “Primal dual interpretation of the proximal stochastic gradient Langevin algorithm,” in Advances in Neural Information Processing Systems, 2020, pp. 3786–3796.
- K. Ahn and S. Chewi, “Efficient constrained sampling via the mirror-Langevin algorithm,” Advances in Neural Information Processing Systems, vol. 34, pp. 28 405–28 418, 2021.
- Y. Kook, Y. Lee, R. Shen, and S. Vempala, “Sampling with Riemannian Hamiltonian Monte Carlo in a constrained space,” in Advances in Neural Information Processing Systems, A. H. Oh, A. Agarwal, D. Belgrave, and K. Cho, Eds., 2022.
- L. Sharrock, L. Mackey, and C. Nemeth, “Learning rate free Bayesian inference in constrained domains,” in Conference on Neural Information Processing Systems, 2023.
- M. Noble, V. De Bortoli, and A. Durmus, “Unbiased constrained sampling with self-concordant barrier Hamiltonian Monte Carlo,” in Thirty-seventh Conference on Neural Information Processing Systems, 2023.
- A. Mądry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu, “Towards deep learning models resistant to adversarial attacks,” in International Conference on Learning Representations, 2018.
- M. Kearns, S. Neel, A. Roth, and Z. S. Wu, “Preventing fairness Gerrymandering: Auditing and learning for subgroup fairness,” in International Conference on Machine Learning, 2018, pp. 2564–2572.
- A. Cotter, H. Jiang, M. Gupta, S. Wang, T. Narayan, S. You, and K. Sridharan, “Optimization with non-differentiable constraints with applications to fairness, recall, churn, and other goals,” Journal of Machine Learning Research, vol. 20, no. 172, pp. 1–59, 2019.
- L. F. O. Chamon, S. Paternain, M. Calvo-Fullana, and A. Ribeiro, “Constrained learning with non-convex losses,” IEEE Trans. on Inf. Theory, vol. 69[3], pp. 1739–1760, 2023.
- M. Gürbüzbalaban, Y. Hu, and L. Zhu, “Penalized Langevin and Hamiltonian Monte Carlo Algorithms for Constrained Sampling,” 2022.
- X. Liu, X. Tong, and Q. Liu, “Sampling with trusthworthy constraints: A variational gradient framework,” Advances in Neural Information Processing Systems, vol. 34, pp. 23 557–23 568, 2021.
- R. Jordan, D. Kinderlehrer, and F. Otto, “The variational formulation of the Fokker–Planck equation,” SIAM Journal on Mathematical Analysis, vol. 29, no. 1, pp. 1–17, 1998.
- A. S. Dalalyan and A. Karagulyan, “User-friendly guarantees for the Langevin Monte Carlo with inaccurate gradient,” Stochastic Processes and their Applications, vol. 129, no. 12, pp. 5278–5311, 2019.
- A. S. Dalalyan, A. Karagulyan, and L. Riou-Durand, “Bounding the error of discretized Langevin algorithms for non-strongly log-concave targets,” Journal of Machine Learning Research, vol. 23, no. 235, pp. 1–38, 2022.
- S. Vempala and A. Wibisono, “Rapid convergence of the unadjusted Langevin algorithm: Isoperimetry suffices,” in Advances in Neural Information Processing Systems (NeurIPS), 2019, pp. 8092–8104.
- V. Jeyakumar and H. Wolkowicz, “Generalizations of slater’s constraint qualification for infinite convex programs,” Math. Program., vol. 57, no. 1–3, pp. 85–101, 1992.
- B. Kloeckner, “Approximation by finitely supported measures,” ESAIM: Control, Optimisation and Calculus of Variations, vol. 18, no. 2, pp. 343–359, 2012.
- M. Welling and Y. W. Teh, “Bayesian learning via stochastic gradient Langevin dynamics,” in Proceedings of the 28th International Conference on International Conference on Machine Learning, ser. ICML’11. Madison, WI, USA: Omnipress, 2011, p. 681–688.
- A. Salim, A. Korba, and G. Luise, “The Wasserstein proximal gradient algorithm,” Advances in Neural Information Processing Systems, vol. 33, pp. 12 356–12 366, 2020.
- A. Nedić and A. Ozdaglar, “Approximate primal solutions and rate analysis for dual subgradient methods,” SIAM Journal on Optimization, vol. 19, no. 4, pp. 1757–1780, 2009.
- A. Cherukuri, E. Mallada, and J. Cortés, “Asymptotic convergence of constrained primal–dual dynamics,” Systems & Control Letters, vol. 87, pp. 10–15, 2016.
- A. Nemirovski, “Prox-method with rate of convergence o(1/t) for variational inequalities with Lipschitz continuous monotone operators and smooth convex-concave saddle point problems,” SIAM Journal on Optimization, vol. 15, no. 1, pp. 229–251, 2004.
- Y. Nesterov, “Dual extrapolation and its applications to solving variational inequalities and related problems,” Math. Program., vol. 109, no. 2–3, p. 319–344, 2007.
- T. Lin, C. Jin, and M. I. Jordan, “Near-optimal algorithms for minimax optimization,” in Proceedings of Thirty Third Conference on Learning Theory, 2020, pp. 2738–2779.
- A. Mokhtari, A. Ozdaglar, and S. Pattathil, “A unified analysis of extra-gradient and optimistic gradient methods for saddle point problems: Proximal point approach,” in Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, 2020, pp. 1497–1507.
- S. Chewi, M. A. Erdogdu, M. Li, R. Shen, and S. Zhang, “Analysis of Langevin Monte Carlo: from Poincaré to Log-Sobolev,” in Proceedings of Thirty Fifth Conference on Learning Theory, ser. Proceedings of Machine Learning Research, P.-L. Loh and M. Raginsky, Eds., vol. 178. PMLR, 2022, pp. 1–2.
- R. Holley and D. W. Stroock, “Logarithmic Sobolev inequalities and stochastic Ising models,” Journal of Statistical Physics, no. 5–6, 1986.
- P. Cattiaux and A. Guillin, “Functional inequalities for perturbed measures with applications to log-concave measures and to some Bayesian problems,” Bernoulli, vol. 28, no. 4, 2022.
- H. Karimi, J. Nutini, and M. Schmidt, “Linear convergence of gradient and proximal-gradient methods under the Polyak-Łojasiewicz condition,” in Machine Learning and Knowledge Discovery in Databases, P. Frasconi, N. Landwehr, G. Manco, and J. Vreeken, Eds. Cham: Springer International Publishing, 2016, pp. 795–811.
- J. Yang, X. Li, and N. He, “Nest your adaptive algorithm for parameter-agnostic nonconvex minimax optimization,” in Advances in Neural Information Processing Systems, 2022.
- M. Boroun, Z. Alizadeh, and A. Jalilzadeh, “Accelerated primal-dual scheme for a class of stochastic nonconvex-concave saddle point problems,” in American Control Conference, 2023, pp. 204–209.
- M. Sanjabi, J. Ba, M. Razaviyayn, and J. D. Lee, “On the convergence and robustness of training gans with regularized optimal transport,” in Advances in Neural Information Processing Systems, 2018.
- J. Yang, N. Kiyavash, and N. He, “Global convergence and variance reduction for a class of nonconvex-nonconcave minimax problems,” in Advances in Neural Information Processing Systems, 2020, pp. 1153–1165.
- T. Fiez, L. Ratliff, E. Mazumdar, E. Faulkner, and A. Narang, “Global convergence to local minmax equilibrium in classes of nonconvex zero-sum games,” in Advances in Neural Information Processing Systems, 2021, pp. 29 049–29 063.
- K. Ahn and S. Chewi, “Efficient constrained sampling via the mirror-Langevin algorithm,” in Advances in Neural Information Processing Systems, vol. 34, 2021, p. 26.
- D. Dua and C. Graff, “UCI machine learning repository,” 2017. [Online]. Available: http://archive.ics.uci.edu/ml
- L. F. O. Chamon and A. Ribeiro, “Probably approximately correct constrained learning,” in Advances in Neural Information Processing, 2020.
- D. M. Blei, A. Y. Ng, and M. I. Jordan, “Latent Dirichlet allocation,” J. Mach. Learn. Res., vol. 3, pp. 993–1022, 2003.
- G. Celeux, M. E. Anbari, J.-M. Marin, and C. P. Robert, “Regularization in Regression: Comparing Bayesian and Frequentist Methods in a Poorly Informative Situation,” Bayesian Analysis, vol. 7, no. 2, pp. 477–502, 2012.
- A. Lamperski, “Projected stochastic gradient Langevin algorithms for constrained sampling and non-convex learning,” in Conference on Learning Theory. PMLR, 2021, pp. 2891–2937.
- L. Li, Q. Liu, A. Korba, M. Yurochkin, and J. Solomon, “Sampling with mollified interaction energy descent,” arXiv preprint arXiv:2210.13400, 2022.
- K. S. Zhang, G. Peyré, J. Fadili, and M. Pereyra, “Wasserstein control of mirror Langevin Monte Carlo,” in Conference on Learning Theory. PMLR, 2020, pp. 3814–3841.
- Q. Jiang, “Mirror Langevin Monte Carlo: the case under isoperimetry,” Advances in Neural Information Processing Systems, vol. 34, pp. 715–725, 2021.
- V. Srinivasan, A. Wibisono, and A. Wilson, “Fast sampling from constrained spaces using the metropolis-adjusted mirror Langevin algorithm,” arXiv preprint arXiv:2312.08823, 2023.
- J. Shi, C. Liu, and L. Mackey, “Sampling with mirrored Stein operators,” International Conference of Learning Representations, 2022.
- M. Girolami and B. Calderhead, “Riemann manifold Langevin and Hamiltonian Monte Carlo methods,” Journal of the Royal Statistical Society Series B: Statistical Methodology, vol. 73, no. 2, pp. 123–214, 2011.
- M. Brubaker, M. Salzmann, and R. Urtasun, “A family of MCMC methods on implicitly defined manifolds,” in Artificial intelligence and statistics. PMLR, 2012, pp. 161–172.
- P. Tseng, “On linear convergence of iterative methods for the variational inequality problem,” Journal of Computational and Applied Mathematics, vol. 60, no. 1, pp. 237–252, 1995.
- N. Golowich, S. Pattathil, C. Daskalakis, and A. Ozdaglar, “Last iterate is slower than averaged iterate in smooth convex-concave saddle point problems,” in Proceedings of Thirty Third Conference on Learning Theory, 2020, pp. 1758–1784.
- F. Otto, “The geometry of dissipative evolution equations: the porous medium equation,” Communications in Partial Differential Equations, vol. 26, no. 1-2, pp. 101–174, 2001.
- A. T. Schwarm and M. Nikolaou, “Chance-constrained model predictive control,” AIChE Journal, vol. 45[8], pp. 1743–1752, 1999.
- P. Li, M. Wendt, and G. Wozny, “Robust model predictive control under chance constraints,” Computers & Chemical Engineering, vol. 24[2-7], pp. 829–834, 2000.
Sponsored by Paperpile, the PDF & BibTeX manager trusted by top AI labs.
Get 30 days freePaper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.