Stopping Bayesian Optimization with Probabilistic Regret Bounds (2402.16811v2)
Abstract: Bayesian optimization is a popular framework for efficiently tackling black-box search problems. As a rule, these algorithms operate by iteratively choosing what to evaluate next until some predefined budget has been exhausted. We investigate replacing this de facto stopping rule with criteria based on the probability that a point satisfies a given set of conditions. We focus on the prototypical example of an $(\epsilon, \delta)$-criterion: stop when a solution has been found whose value is within $\epsilon > 0$ of the optimum with probability at least $1 - \delta$ under the model. For Gaussian process priors, we show that Bayesian optimization satisfies this criterion under mild technical assumptions. Further, we give a practical algorithm for evaluating Monte Carlo stopping rules in a manner that is both sample efficient and robust to estimation error. These findings are accompanied by empirical results which demonstrate the strengths and weaknesses of the proposed approach.
- TensorFlow: Large-scale machine learning on heterogeneous systems, 2015. URL https://www.tensorflow.org/. Software available from tensorflow.org.
- Random fields and geometry. Springer Science & Business Media, 2009.
- Tuning bandit algorithms in stochastic environments. In International conference on algorithmic learning theory, pp. 150–165. Springer, 2007.
- BoTorch: A Framework for Efficient Monte-Carlo Bayesian Optimization. In Advances in Neural Information Processing Systems 33, 2020. URL http://arxiv.org/abs/1910.06403.
- Towards scaling up Markov chain Monte Carlo: an adaptive subsampling approach. In International conference on machine learning, pp. 405–413. PMLR, 2014.
- Bishop, C. Pattern recognition and machine learning. Springer google schola, 2:531–537, 2006.
- Borell, C. The Brunn-Minkowski inequality in Gauss space. Inventiones mathematicae, 30(2):207–216, 1975.
- A limited memory algorithm for bound constrained optimization. SIAM Journal on scientific computing, 16(5):1190–1208, 1995.
- Lenient regret and good-action identification in Gaussian process bandits. In International Conference on Machine Learning, pp. 1183–1192. PMLR, 2021.
- The knowledge-gradient policy for correlated normal beliefs. INFORMS journal on Computing, 21(4):599–613, 2009.
- Garnett, R. Bayesian optimization. Cambridge University Press, 2023.
- Constrained Bayesian optimization for automatic chemical design using variational autoencoders. Chemical science, 11(2):577–586, 2020.
- Regret bounds for Gaussian process bandit problems. In Proceedings of the thirteenth international conference on artificial intelligence and statistics, pp. 273–280. JMLR Workshop and Conference Proceedings, 2010.
- Hansen, N. The CMA evolution strategy: A tutorial. arXiv preprint arXiv:1604.00772, 2016.
- Jones, D. R. A taxonomy of global optimization methods based on response surfaces. Journal of global optimization, 21:345–383, 2001.
- Kushner, H. J. A versatile stochastic model of a function of unknown and time varying form. Journal of Mathematical Analysis and Applications, 5(1):150–167, 1962.
- Posterior variance analysis of Gaussian processes with application to average learning curves. arXiv preprint arXiv:1906.01404, 2019a.
- Uniform error bounds for Gaussian process regression with application to safe control. Advances in Neural Information Processing Systems, 32, 2019b.
- A study of Bayesian neural network surrogates for Bayesian optimization. arXiv preprint arXiv:2305.20028, 2023.
- Automatic termination for hyperparameter optimization. In International Conference on Automated Machine Learning, pp. 7–1. PMLR, 2022.
- Massart, P. Concentration inequalities and model selection: Ecole d’Eté de Probabilités de Saint-Flour XXXIII-2003. Springer, 2007.
- GPflow: A Gaussian process library using TensorFlow. Journal of Machine Learning Research, 18(40):1–6, apr 2017. URL http://jmlr.org/papers/v18/16-537.html.
- Empirical Bernstein bounds and sample variance penalization. arXiv preprint arXiv:0907.3740, 2009.
- Empirical Bernstein stopping. In Proceedings of the 25th international conference on Machine learning, pp. 672–679, 2008.
- Močkus, J. On Bayesian methods for seeking the extremum. In Optimization Techniques IFIP Technical Conference: Novosibirsk, July 1–7, 1974, pp. 400–404. Springer, 1975.
- Regret for expected improvement over the best-observed value and stopping condition. In Asian conference on machine learning, pp. 279–294. PMLR, 2017.
- Trieste: Efficiently exploring the depths of black-box functions with tensorflow, 2023. URL https://arxiv.org/abs/2302.08436.
- Random features for large-scale kernel machines. Advances in neural information processing systems, 20, 2007.
- Doubly stochastic variational inference for deep Gaussian processes. Advances in neural information processing systems, 30, 2017.
- Šaltenis, V. R. On a method of multi-extremal optimization. Automatics and Computers (Avtomatika i Vychislitelnayya Tekchnika), 3:33–38, 1971.
- The correlated knowledge gradient for simulation optimization of continuous parameters using Gaussian process regression. SIAM Journal on Optimization, 21(3):996–1026, 2011.
- Simon, H. A. Rational choice and the structure of the environment. Psychological review, 63(2):129, 1956.
- Marginalised Gaussian processes with nested sampling. Advances in neural information processing systems, 34:13613–13625, 2021.
- Practical Bayesian optimization of machine learning algorithms. Advances in neural information processing systems, 25, 2012.
- Gaussian process optimization in the bandit setting: No regret and experimental design. arXiv preprint arXiv:0912.3995, 2009.
- Reinforcement learning: An introduction. MIT press, 2018.
- Global optimization, volume 350. Springer, 1989.
- Norms of Gaussian sample functions. In Proceedings of the Third Japan—USSR Symposium on Probability Theory, volume 550, pp. 20–41. Tashkent, 1976.
- Convergence properties of the expected improvement algorithm with fixed mean and covariance functions. Journal of Statistical Planning and inference, 140(11):3088–3095, 2010.
- Optimal experimental design via Bayesian optimization: active causal structure learning for Gaussian process networks. arXiv preprint arXiv:1910.03962, 2019.
- Maximizing acquisition functions for Bayesian optimization. Advances in neural information processing systems, 31, 2018.
- Efficiently sampling functions from Gaussian process posteriors. In International Conference on Machine Learning, pp. 10292–10302. PMLR, 2020.
- Pathwise conditioning of Gaussian processes. The Journal of Machine Learning Research, 22(1):4741–4787, 2021.