SOBER: Highly Parallel Bayesian Optimization and Bayesian Quadrature over Discrete and Mixed Spaces (2301.11832v4)
Abstract: Batch Bayesian optimisation and Bayesian quadrature have been shown to be sample-efficient methods of performing optimisation and quadrature where expensive-to-evaluate objective functions can be queried in parallel. However, current methods do not scale to large batch sizes -- a frequent desideratum in practice (e.g. drug discovery or simulation-based inference). We present a novel algorithm, SOBER, which permits scalable and diversified batch global optimisation and quadrature with arbitrary acquisition functions and kernels over discrete and mixed spaces. The key to our approach is to reformulate batch selection for global optimisation as a quadrature problem, which relaxes acquisition function maximisation (non-convex) to kernel recombination (convex). Bridging global optimisation and quadrature can efficiently solve both tasks by balancing the merits of exploitative Bayesian optimisation and explorative Bayesian quadrature. We show that SOBER outperforms 11 competitive baselines on 12 synthetic and diverse real-world tasks.
- Masaki Adachi. High-dimensional discrete Bayesian optimization with self-supervised representation learning for data-efficient materials exploration. In NeurIPS 2021 AI for Science Workshop, 2021.
- Fast Bayesian inference with batch Bayesian quadrature via kernel recombination. Advances in Neural Information Processing Systems, 35, 2022.
- Domain-agnostic batch Bayesian optimization with diverse constraints via Bayesian quadrature. arXiv preprint arXiv:2306.05843, 2023.
- Bayesian model selection of lithium-ion battery models via Bayesian quadrature. arXiv preprint arXiv:2210.17299, 2022.
- Batch Bayesian optimization via simulation matching. Advances in Neural Information Processing Systems, 23, 2010.
- BoTorch: a framework for efficient Monte-Carlo Bayesian optimization. Advances in neural information processing systems, 33:21524–21538, 2020.
- Bayesian optimization of combinatorial structures. In Jennifer Dy and Andreas Krause, editors, Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pages 462–471. PMLR, 10–15 Jul 2018. URL: https://proceedings.mlr.press/v80/baptista18a.html.
- Zdravko I Botev. The normal law under linear restrictions: simulation and estimation via minimax tilting. Journal of the Royal Statistical Society Series B: Statistical Methodology, 79(1):125–148, 2017.
- Probabilistic integration. Statistical Science, 34(1):1–22, 2019.
- Hepatic differentiation of human pluripotent stem cells in miniaturized format suitable for high-throughput screen. Stem Cell Research, 16(3):640–650, 2016.
- Improving quadrature for constrained integrands. In The 22nd International Conference on Artificial Intelligence and Statistics, pages 2751–2759. PMLR, 2019.
- Bayesian optimization over discrete and mixed spaces via probabilistic reparameterization. Advances in neural information processing systems, 35, 2022.
- Mercer features for efficient combinatorial bayesian optimization. Proceedings of the AAAI Conference on Artificial Intelligence, 35(8):7210–7218, 2021. URL: https://ojs.aaai.org/index.php/AAAI/article/view/16886, doi:10.1609/aaai.v35i8.16886.
- On the nystrom method for approximating a gram matrix for improved kernel-based learning. Journal of Machine Learning Research, 6(72):2153–2175, 2005. URL: http://jmlr.org/papers/v6/drineas05a.html.
- UCI machine learning repository, 2017. URL: http://archive.ics.uci.edu/ml.
- Kernel thinning. arXiv preprint arXiv:2105.05842, 2021.
- Scalable global optimization via local bayesian optimization. Advances in neural information processing systems, 32, 2019.
- Efficient and robust automated machine learning. Advances in neural information processing systems, 28, 2015.
- GPyTorch: Blackbox matrix-matrix Gaussian process inference with GPU acceleration. Advances in neural information processing systems, 31, 2018.
- Automatic chemical design using a data-driven continuous representation of molecules. ACS central science, 4(2):268–276, 2018.
- Batch Bayesian optimization via local penalization. In Artificial intelligence and statistics, pages 648–657. PMLR, 2016.
- 2d image registration in ct images using radial image descriptors. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2011: 14th International Conference, Toronto, Canada, September 18-22, 2011, Proceedings, Part II 14, pages 607–614. Springer, 2011.
- A kernel two-sample test. The Journal of Machine Learning Research, 13(1):723–773, 2012.
- GAUCHE: A library for Gaussian processes in chemistry. In ICML 2022 2nd AI for Science Workshop, 2022.
- Sampling for inference in probabilistic models with fast Bayesian quadrature. Advances in neural information processing systems, 27, 2014.
- Bayesian optimization for likelihood-free inference of simulator-based statistical models. Journal of Machine Learning Research, 2016.
- Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions. SIAM review, 53(2):217–288, 2011.
- Estimating the probability that a given vector is in the convex hull of a random sample. Probability Theory and Related Fields, 185:705–746, 2023.
- Positively weighted kernel quadrature via subsampling. Advances in Neural Information Processing Systems, 35:6886–6900, 2022.
- Hypercontractivity meets random convex hulls: analysis of randomized multivariate cubatures. Proceedings of the Royal Society A, 479(2273):20220725, 2023. URL: https://royalsocietypublishing.org/doi/abs/10.1098/rspa.2022.0725, doi:10.1098/rspa.2022.0725.
- Sampling-based Nyström approximation and kernel quadrature. arXiv preprint arXiv:2301.09517, 2023.
- Probabilistic Numerics: Computation as Machine Learning. Cambridge University Press, 2022.
- Entropy search for information-efficient global optimization. Journal of Machine Learning Research, 13(6), 2012.
- Parallel and distributed Thompson sampling for large-scale accelerated exploration of chemical space. In International conference on machine learning, pages 1470–1479. PMLR, 2017.
- Bayesian active learning for classification and preference learning. arXiv preprint arXiv:1112.5745, 2011.
- Optimally-weighted herding is Bayesian quadrature. arXiv preprint arXiv:1204.1664, 2012.
- Joint entropy search for maximally-informed Bayesian optimization. arXiv preprint arXiv:2206.04771, 2022.
- π𝜋\piitalic_πBO: Augmenting acquisition functions with user beliefs for Bayesian optimization. arXiv preprint arXiv:2204.11051, 2022.
- Efficient global optimization of expensive black-box functions. Journal of Global optimization, 13(4):455–492, 1998.
- Parallelised Bayesian optimisation via Thompson sampling. In International Conference on Artificial Intelligence and Statistics, pages 133–142. PMLR, 2018.
- Batched Gaussian process bandit optimization via determinantal point processes. Advances in Neural Information Processing Systems, 29, 2016.
- Harold J Kushner. A new method of locating the maximum point of an arbitrary multipeak curve in the presence of noise. Journal of basic engineering, 86:97 – 106, 1964.
- Neil Lawrence. Gaussian process latent variable models for visualisation of high dimensional data. Advances in neural information processing systems, 16, 2003.
- On the limited memory BFGS method for large scale optimization. Mathematical programming, 45(1):503–528, 1989.
- Bayesian optimization for automated model selection. Advances in Neural Information Processing Systems, 29, 2016.
- Differentiating the multipoint expected improvement for optimal batch design. In International Workshop on Machine Learning, Optimization and Big Data, pages 37–48. Springer, 2015.
- Learning skill-based industrial robot tasks with user priors. In 2022 IEEE 18th International Conference on Automation Science and Engineering (CASE), pages 1485–1492. IEEE, 2022.
- UMAP: Uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426, 2018.
- GIBBON: General-purpose information-based Bayesian optimisation. arXiv preprint arXiv:2102.03324, 2021.
- Elliptical slice sampling. In Proceedings of the thirteenth international conference on artificial intelligence and statistics, pages 541–548. JMLR Workshop and Conference Proceedings, 2010.
- Diversified sampling for batched Bayesian optimization with determinantal point processes. In International Conference on Artificial Intelligence and Statistics, pages 7031–7054. PMLR, 2022.
- Batch bayesian optimization on permutations using the acquisition weighted kernel. Advances in Neural Information Processing Systems, 35:6843–6858, 2022.
- Combinatorial Bayesian optimization using the graph Cartesian product. Advances in Neural Information Processing Systems, 32, 2019.
- Active learning of model evidence using Bayesian quadrature. Advances in neural information processing systems, 25, 2012.
- Bayesian quadrature for ratios. In Artificial Intelligence and Statistics, pages 832–840. PMLR, 2012.
- PyTorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32, 2019.
- Random features for large-scale kernel machines. Advances in neural information processing systems, 20, 2007.
- Graph kernels for chemical informatics. Neural networks, 18(8):1093–1110, 2005.
- Quantum chemistry structures and properties of 134 kilo molecules. Scientific data, 1(1):1–7, 2014.
- Bayesian active learning with fully Bayesian Gaussian processes. arXiv preprint arXiv:2205.10186, 2022.
- Bayesian optimisation over multiple continuous and categorical inputs. In International Conference on Machine Learning, pages 8276–8285. PMLR, 2020.
- Fast information-theoretic Bayesian optimisation. In International Conference on Machine Learning, pages 4384–4392. PMLR, 2018.
- Finding global minima via kernel approximations. arXiv preprint arXiv:2012.11978, 2020.
- Query by committee. In Proceedings of the fifth annual workshop on Computational learning theory, pages 287–294, 1992.
- Warped Gaussian processes. Advances in neural information processing systems, 16, 2003.
- The open access malaria box: a drug discovery catalyst for neglected diseases. PloS one, 8(6):e62906, 2013.
- Gaussian process optimization in the bandit setting: No regret and experimental design. arXiv preprint arXiv:0912.3995, 2009.
- S. Surjanovic and D. Bingham. Virtual library of simulation experiments: Test functions and datasets. Retrieved May 18, 2023, from http://www.sfu.ca/~ssurjano.
- M. Tchernychova. Carathéodory cubature measures. PhD thesis, University of Oxford, 2016.
- Variable kernel density estimation. The Annals of Statistics, pages 1236–1265, 1992.
- Tree ensemble kernels for bayesian optimization with known constraints over mixed-feature spaces. Advances in Neural Information Processing Systems, 35:37401–37415, 2022.
- An informational approach to the global optimization of expensive-to-evaluate functions. Journal of Global Optimization, 44(4):509–534, 2009.
- Batch selection for parallelisation of Bayesian quadrature. arXiv preprint, 2018.
- Think global and act local: Bayesian optimisation over high-dimensional categorical and mixed search spaces. arXiv preprint arXiv:2102.07188, 2021.
- Zi Wang and Stefanie Jegelka. Max-value entropy search for efficient Bayesian optimization. In International Conference on Machine Learning, pages 3627–3635. PMLR, 2017.
- Efficiently sampling functions from Gaussian process posteriors. In International Conference on Machine Learning, pages 10292–10302. PMLR, 2020.
- Practical multi-fidelity Bayesian optimization for hyperparameter tuning. In Uncertainty in Artificial Intelligence, pages 788–798. PMLR, 2020.