Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Improving sample efficiency of high dimensional Bayesian optimization with MCMC (2401.02650v1)

Published 5 Jan 2024 in cs.LG and stat.ML

Abstract: Sequential optimization methods are often confronted with the curse of dimensionality in high-dimensional spaces. Current approaches under the Gaussian process framework are still burdened by the computational complexity of tracking Gaussian process posteriors and need to partition the optimization problem into small regions to ensure exploration or assume an underlying low-dimensional structure. With the idea of transiting the candidate points towards more promising positions, we propose a new method based on Markov Chain Monte Carlo to efficiently sample from an approximated posterior. We provide theoretical guarantees of its convergence in the Gaussian process Thompson sampling setting. We also show experimentally that both the Metropolis-Hastings and the Langevin Dynamics version of our algorithm outperform state-of-the-art methods in high-dimensional sequential optimization and reinforcement learning benchmarks.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (25)
  1. An empirical evaluation of thompson sampling. In J. Shawe-Taylor, R. Zemel, P. Bartlett, F. Pereira, and K.Q. Weinberger, editors, Advances in Neural Information Processing Systems, volume 24. Curran Associates, Inc., 2011.
  2. On kernelized multi-armed bandits. In International Conference on Machine Learning, pages 844–853. PMLR, 2017.
  3. High-dimensional bayesian optimization with sparse axis-aligned subspaces. In Uncertainty in Artificial Intelligence, pages 493–503. PMLR, 2021.
  4. Scalable global optimization via local bayesian optimization. Advances in neural information processing systems, 32, 2019.
  5. Peter I Frazier. A tutorial on bayesian optimization. arXiv preprint arXiv:1807.02811, 2018.
  6. Reducing the time complexity of the derandomized evolution strategy with covariance matrix adaptation (cma-es). Evolutionary computation, 11(1):1–18, 2003.
  7. Gaussian processes for big data. arXiv preprint arXiv:1309.6835, 2013.
  8. Mcmc for variationally sparse gaussian processes. Advances in Neural Information Processing Systems, 28, 2015.
  9. Parallel and distributed thompson sampling for large-scale accelerated exploration of chemical space. In International conference on machine learning, pages 1470–1479. PMLR, 2017.
  10. High dimensional bayesian optimization using dropout. arXiv preprint arXiv:1802.05400, 2018.
  11. Ensemble sampling. In I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017.
  12. Simple random search provides a competitive approach to reinforcement learning. arXiv preprint arXiv:1803.07055, 2018.
  13. On approximate thompson sampling with Langevin algorithms. In Hal Daumé III and Aarti Singh, editors, Proceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, pages 6797–6807. PMLR, 13–18 Jul 2020.
  14. Equation of state calculations by fast computing machines. The journal of chemical physics, 21(6):1087–1092, 1953.
  15. A framework for bayesian optimization in embedded subspaces. In International Conference on Machine Learning, pages 4752–4761. PMLR, 2019.
  16. Bock: Bayesian optimization with cylindrical kernels. In International Conference on Machine Learning, pages 3868–3877. PMLR, 2018.
  17. Deep bayesian bandits showdown: An empirical comparison of bayesian deep networks for thompson sampling. ArXiv, abs/1802.09127, 2018.
  18. Taking the human out of the loop: A review of bayesian optimization. Proceedings of the IEEE, 104(1):148–175, 2015.
  19. Practical bayesian optimization of machine learning algorithms. Advances in neural information processing systems, 25, 2012.
  20. Il’ya Meerovich Sobol’. On the distribution of points in a cube and the approximate evaluation of integrals. Zhurnal Vychislitel’noi Matematiki i Matematicheskoi Fiziki, 7(4):784–802, 1967.
  21. Mujoco: A physics engine for model-based control. In 2012 IEEE/RSJ international conference on intelligent robots and systems, pages 5026–5033. IEEE, 2012.
  22. Variational inference for the multi-armed contextual bandit. ArXiv, abs/1709.03163, 2017.
  23. Learning search space partition for black-box optimization using monte carlo tree search. Advances in Neural Information Processing Systems, 33:19511–19522, 2020.
  24. Batched large-scale bayesian optimization in high-dimensional spaces. In International Conference on Artificial Intelligence and Statistics, pages 745–754. PMLR, 2018.
  25. Langevin Monte Carlo for contextual bandits. In Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvari, Gang Niu, and Sivan Sabato, editors, Proceedings of the 39th International Conference on Machine Learning, volume 162 of Proceedings of Machine Learning Research, pages 24830–24850. PMLR, 17–23 Jul 2022.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Zeji Yi (9 papers)
  2. Yunyue Wei (7 papers)
  3. Chu Xin Cheng (4 papers)
  4. Kaibo He (4 papers)
  5. Yanan Sui (29 papers)
Citations (4)

Summary

Introduction

Bayesian optimization (BO) is an effective approach for optimizing black-box functions that lack gradient information. This approach has seen success across various real-world engineering challenges and machine learning applications like hyper-parameter tuning. At the heart of BO lies a surrogate model that helps make intelligent guesses about the objective function's landscape, with Gaussian processes (GPs) being a typical choice for this model due to their probabilistic nature and ability to manage uncertainty.

However, as the problem's dimensionality increases, BO falls prey to the curse of dimensionality, leading to computational inefficiencies and a potential explosion in the candidate sample space. This paper introduces a novel approach, named MCMC-BO, that uses Markov Chain Monte Carlo (MCMC) techniques to enhance the sample efficiency for doing BO in high-dimensional spaces.

Related Work

Existing efforts to mitigate the dimensionality problem in BO have focused on creating partitions in the search space, ranging from trust regions to tree-based partitions that isolate promising regions for sampling. Although these methods improve sampling strategies, they often rely on discretizing the search space, a technique that becomes less effective as dimensionality grows. As a result, the challenge remains to balance exploring the vast uncertainty with exploiting known good regions, all while keeping computational demands in check.

Algorithm Design

At its core, MCMC-BO is an algorithm that integrates the principles of BO with the sampling prowess of MCMC. The procedure transitions candidate points towards areas of the search space that show promise based on Gaussian process Thompson sampling. Unlike traditional BO, which may require storing and computing over a vast discretized mesh of points, MCMC-BO only tracks a manageable batch of points. By doing so, the algorithm retains theoretical performance guarantees while significantly reducing memory and computational overhead.

The paper details the implementation of two MCMC strategies: Metropolis-Hastings (MH) and Langevin Dynamics (LD). Both these methods are adapted to the context of BO and are used to transit candidate points in accordance with the Gaussian process model's posterior.

Theoretical Guarantee and Experiments

The authors provide a theoretical framework that guarantees the convergence of MCMC-BO. They postulate that the algorithm can effectively circumvent the limitations posed by high dimensionality, striking a balance between exploration and exploitation without the excessive memory use traditionally associated with fine-grain discretization of high-dimensional spaces.

Experimental evidence showcases the superiority of MCMC-BO over standard BO methods and state-of-the-art high-dimensional BO strategies across various benchmarks, including challenging Mujoco locomotion tasks. Notably, the experiments demonstrate that MCMC-BO maintains good performance even as problem dimensions scale.

Conclusion

MCMC-BO addresses the critical challenge of sample efficiency in high-dimensional Bayesian optimization by introducing an MCMC-based local optimization method. Its theoretical foundations ensure that performance is not compromised even as dimensionality scales. As optimization tasks in high-dimensional spaces become more prevalent, methodologies like MCMC-BO will become increasingly valuable. The authors highlight that there is room for further enhancements, particularly in parallel computing and analytical backward computations, offering an exciting avenue for ongoing research.

X Twitter Logo Streamline Icon: https://streamlinehq.com