Maximum a Posteriori Inference for Factor Graphs via Benders' Decomposition (2410.19131v1)
Abstract: Many Bayesian statistical inference problems come down to computing a maximum a-posteriori (MAP) assignment of latent variables. Yet, standard methods for estimating the MAP assignment do not have a finite time guarantee that the algorithm has converged to a fixed point. Previous research has found that MAP inference can be represented in dual form as a linear programming problem with a non-polynomial number of constraints. A Lagrangian relaxation of the dual yields a statistical inference algorithm as a linear programming problem. However, the decision as to which constraints to remove in the relaxation is often heuristic. We present a method for maximum a-posteriori inference in general Bayesian factor models that sequentially adds constraints to the fully relaxed dual problem using Benders' decomposition. Our method enables the incorporation of expressive integer and logical constraints in clustering problems such as must-link, cannot-link, and a minimum number of whole samples allocated to each cluster. Using this approach, we derive MAP estimation algorithms for the Bayesian Gaussian mixture model and latent Dirichlet allocation. Empirical results show that our method produces a higher optimal posterior value compared to Gibbs sampling and variational Bayes methods for standard data sets and provides certificate of convergence.
- An introduction to MCMC for machine learning. Machine learning, 50:5–43, 2003.
- Interactive correlation clustering with existential cluster constraints. In Proceedings of the 39th International Conference on Machine Learning, volume 162 of Proceedings of Machine Learning Research, pages 703–716. PMLR, 17–23 Jul 2022.
- On smoothing and inference for topic models. In Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence, pages 27–34, 2009.
- Statistical guarantees for the EM algorithm: From population to sample-based analysis. The Annals of Statistics, 45(1):77 – 120, 2017.
- Philipp Baumann. A binary linear programming-based K-means algorithm for clustering with must-link and cannot-link constraints. In 2020 IEEE International Conference on Industrial Engineering and Engineering Management (IEEM), pages 324–328, 2020.
- Logistic regression: From art to science. Statistical Science, 32(3):367–384, 2017.
- Optimization over integers. Dynamic Ideas, Belmont, 2005.
- Pattern recognition and machine learning, volume 4. Springer, 2006.
- On the development of a general algebraic modeling system in a strategic planning environment. In Applications, volume 20 of Mathematical Programming Studies, pages 1–29. Springer Berlin Heidelberg, 1982.
- Variational inference: A review for statisticians. Journal of the American statistical Association, 112(518):859–877, 2017.
- Latent dirichlet allocation. Journal of machine Learning research, 3(Jan):993–1022, 2003.
- General methods for monitoring convergence of iterative simulations. Journal of computational and graphical statistics, 7(4):434–455, 1998.
- Mattia Bruno. Auto-encoding variational Bayes. PhD thesis, Università degli Studi di Padova, 2021.
- Understanding the Metropolis-Hastings algorithm. The American Statistician, 49(4):327–335, 1995.
- Jens Clausen. Branch and bound algorithms-principles and examples. Technical report, Department of Computer Science, University of Copenhagen, 1999.
- Ian Davidson and SS Ravi. Clustering with constraints: Feasibility issues and the k-means algorithm. In Proceedings of the 2005 SIAM international conference on data mining, pages 138–149. SIAM, 2005.
- UCI machine learning repository, 2017.
- An outer-approximation algorithm for a class of mixed-integer nonlinear programs. Mathematical programming, 36:307–339, 1986.
- Marco Duran-Pena. A mixed-integer nonlinear programming approach for the systematic synthesis of engineering systems. PhD thesis, Department of Chemical Engineering, Carnegie-Mellon University, 1984.
- Christodoulos A Floudas. Deterministic global optimization: theory, methods and applications, volume 37. Springer Science & Business Media, 2013.
- GAMS Development Corporation. GAMS — A User’s Guide. GAMS Development Corporation, 2017.
- Inference from iterative simulation using multiple sequences. Statistical science, 7(4):457–472, 1992.
- Stochastic relaxation, gibbs distributions, and the bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-6(6):721–741, 1984.
- Arthur M. Geoffrion. Generalized benders decomposition. Journal of Optimization Theory and Applications, 10, 1972.
- A linear programming approach to the cutting-stock problem. Operations research, 9(6):849–859, 1961.
- Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming. Journal of the ACM (JACM), 42(6):1115–1145, 1995.
- Prediction and semantic association. Advances in Neural Information Processing Systems (NeurIPS), 2002.
- Prabhat Hajela. Genetic search-an approach to the nonconvex optimization problem. AIAA journal, 28(7):1205–1210, 1990.
- Gregor Heinrich. Parameter estimation for text analysis. Technical report, Fraunhofer IGD, 2005.
- S. Hettich and S. D. Bay. UCI kdd archive, 1999.
- A review and comparison of solvers for convex MINLP. Optimization and Engineering, 20(2):397–455, Jun 2019.
- An automatic method of solving discrete programming problems. Econometrica, 28(3):497, 1960.
- Marina Meilă. Comparing clusterings by the variation of information. In Learning Theory and Kernel Machines: 16th Annual Conference on Learning Theory and 7th Kernel Workshop, COLT/Kernel 2003, Washington, DC, USA, August 24-27, 2003. Proceedings, pages 173–187. Springer, 2003.
- Kevin P Murphy. Machine learning: a probabilistic perspective. MIT press, 2012.
- Radford M Neal et al. MCMC using Hamiltonian dynamics. Handbook of Markov Chain Monte Carlo, 2(11):2, 2011.
- Dense distributions from sparse samples: improved gibbs sampling parameter estimators for lda. The Journal of Machine Learning Research, 18(1):2058–2115, 2017.
- On statistical optimality of variational bayes. In International Conference on Artificial Intelligence and Statistics, pages 1579–1588. PMLR, 2018.
- Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830, 2011.
- Vivekananda Roy. Convergence diagnostics for markov chain monte carlo. Annual Review of Statistics and Its Application, 7:387–412, 2020.
- N. V. Sahinidis. BARON 17.8.9: Global Optimization of Mixed-Integer Nonlinear Programs, User’s Manual, 2017.
- Solomon Eyal Shimony. Finding maps for belief networks is NP-hard. Artificial intelligence, 68(2):399–410, 1994.
- A mode-hopping MCMC sampler. Technical Report CSRG-478, University of Toronto, 2003.
- Introduction to dual decomposition for inference. In Optimization for Machine Learning, pages 219–254. MIT Press, 2012.
- Tightening LP relaxations for MAP using message-passing. In 24th Conference in Uncertainty in Artificial Intelligence, pages 503–510. AUAI Press, 2008.
- Nonconvex optimization by fast simulated annealing. Proceedings of the IEEE, 75(11):1538–1540, 1987.
- A polyhedral branch-and-cut approach to global optimization. Mathematical Programming, 103(2):225–249, 2005.
- A practical tutorial on variational bayes. arXiv preprint arXiv:2103.01327, 2021.
- Graphical models, exponential families, and variational inference. Foundations and Trends® in Machine Learning, 1(1–2):1–305, 2008.
- Evaluation methods for topic models. In Proceedings of the 26th annual international conference on machine learning, pages 1105–1112, 2009.
- Frequentist consistency of variational bayes. Journal of the American Statistical Association, 114(527):1147–1161, 2019.
- Mixed state estimation for a linear gaussian markov model. In 2008 47th IEEE Conference on Decision and Control, pages 3219–3226. IEEE, 2008.