Large Language Models to Enhance Bayesian Optimization (2402.03921v2)
Abstract: Bayesian optimization (BO) is a powerful approach for optimizing complex and expensive-to-evaluate black-box functions. Its importance is underscored in many applications, notably including hyperparameter tuning, but its efficacy depends on efficiently balancing exploration and exploitation. While there has been substantial progress in BO methods, striking this balance remains a delicate process. In this light, we present LLAMBO, a novel approach that integrates the capabilities of LLMs (LLM) within BO. At a high level, we frame the BO problem in natural language, enabling LLMs to iteratively propose and evaluate promising solutions conditioned on historical evaluations. More specifically, we explore how combining contextual understanding, few-shot learning proficiency, and domain knowledge of LLMs can improve model-based BO. Our findings illustrate that LLAMBO is effective at zero-shot warmstarting, and enhances surrogate modeling and candidate sampling, especially in the early stages of search when observations are sparse. Our approach is performed in context and does not require LLM finetuning. Additionally, it is modular by design, allowing individual components to be integrated into existing BO frameworks, or function cohesively as an end-to-end method. We empirically validate LLAMBO's efficacy on the problem of hyperparameter tuning, highlighting strong empirical performance across a range of diverse benchmarks, proprietary, and synthetic tasks.
- Optuna: A next-generation hyperparameter optimization framework. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2019.
- Hpo-b: A large-scale reproducible benchmark for black-box hpo based on openml. In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2), 2021.
- Quantification of model uncertainty: Calibration, model discrepancy, and identifiability. Journal of mechanical design, 134(10), 2012.
- Collaborative hyperparameter tuning. In International conference on machine learning, pages 199–207. PMLR, 2013.
- James O Berger. Statistical decision theory and Bayesian analysis. Springer Science & Business Media, 2013.
- Algorithms for hyper-parameter optimization. Advances in neural information processing systems, 24, 2011.
- Hyperopt: A python library for model selection and hyperparameter optimization. Computational Science & Discovery, 8(1):014008, 2015.
- A bayesian interactive optimization approach to procedural animation design. In Proceedings of the 2010 ACM SIGGRAPH/Eurographics Symposium on Computer Animation, pages 103–112, 2010.
- Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
- Manifold gaussian processes for regression. In 2016 International joint conference on neural networks (IJCNN), pages 3338–3345. IEEE, 2016.
- Bayesian optimization for learning gaits under uncertainty: An experimental comparison on a dynamic bipedal walker. Annals of Mathematics and Artificial Intelligence, 76:5–23, 2016.
- Towards learning universal hyperparameter optimizers with transformers. Advances in Neural Information Processing Systems, 35:32053–32068, 2022.
- Palm: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311, 2022.
- Hebo: Pushing the limits of sample-efficient hyperparameter optimisation. Journal of Artificial Intelligence Research, 74, 07 2022.
- Hebo: Pushing the limits of sample-efficient hyper-parameter optimisation. Journal of Artificial Intelligence Research, 74:1269–1349, 2022.
- Parallelizing exploration-exploitation tradeoffs in gaussian process bandit optimization. Journal of Machine Learning Research, 15:3873–3923, 2014.
- Lift: Language-interfaced fine-tuning for non-language machine learning tasks. In A. H. Oh, A. Agarwal, D. Belgrave, and K. Cho, editors, Advances in Neural Information Processing Systems, 2022.
- The surveillance, epidemiology and end results (SEER) program and pathology: towards strengthening the critical relationship. The American Journal of Surgical Pathology, 40(12):e94, 2016.
- David Duvenaud. Automatic model construction with Gaussian processes. PhD thesis, 2014.
- Scalable global optimization via local Bayesian optimization. In Advances in Neural Information Processing Systems, pages 5496–5507, 2019.
- Bohb: Robust and efficient hyperparameter optimization at scale. In International conference on machine learning, pages 1437–1446. PMLR, 2018.
- Scalable meta-learning for bayesian optimization. arXiv preprint arXiv:1802.02219, 2018.
- Initializing bayesian hyper-parameter optimization via meta-learning. In Twenty-Ninth AAAI Conference on Artificial Intelligence, 2015.
- Peter I Frazier. A tutorial on bayesian optimization. arXiv preprint arXiv:1807.02811, 2018.
- Bayesian optimization for adaptive experimental design: A review. IEEE access, 8:13937–13948, 2020.
- In-context learning of large language models explained as kernel regression. arXiv preprint arXiv:2305.12766, 2023.
- Efficient global optimization using deep gaussian processes. In 2018 IEEE Congress on evolutionary computation (CEC), pages 1–8. IEEE, 2018.
- Tabllm: Few-shot classification of tabular data with large language models. In International Conference on Artificial Intelligence and Statistics, pages 5549–5581. PMLR, 2023.
- Sequential model-based optimization for general algorithm configuration. In International Conference on Learning and Intelligent Optimization, pages 507–523. Springer, 2011.
- Dataset2vec: Learning dataset meta-features. Data Mining and Knowledge Discovery, 35(3):964–985, 2021.
- Efficient global optimization of expensive black-box functions. Journal of Global optimization, 13:455–492, 1998.
- Chembo: Bayesian optimization of small organic molecules with synthesizable recommendations. In International Conference on Artificial Intelligence and Statistics, pages 3393–3403. PMLR, 2020.
- HJ Kushner. A new method of locating the maximum point of an arbitrary multipeak curve in the presence of noise. Journal of Basic Engineering, 86(1):97–106, 1964.
- Smac3: A versatile bayesian optimization package for hyperparameter optimization. The Journal of Machine Learning Research, 23(1):2475–2483, 2022.
- Automatic gait optimization with gaussian process regression. In IJCAI, volume 7, pages 944–949, 2007.
- Fantastically ordered prompts and where to find them: Overcoming few-shot prompt order sensitivity. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 8086–8098, 2022.
- Large language models as general pattern machines. arXiv preprint arXiv:2307.04721, 2023.
- Pfns4bo: In-context learning for bayesian optimization. 2023.
- Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35:27730–27744, 2022.
- Prostate Cancer UK PCUK. Cutract. https://prostatecanceruk.org, 2019.
- Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830, 2011.
- The refinedweb dataset for falcon llm: outperforming curated corpora with web data, and web data only. arXiv preprint arXiv:2306.01116, 2023.
- Predicting survival in heart failure: a risk score based on 39 372 patients from 30 studies. European Heart Journal, 34(19):1404–1413, 2013.
- Warm starting bayesian optimization. In 2016 Winter Simulation Conference (WSC), pages 770–781. IEEE, 2016.
- Bayesian optimization of catalysts with in-context learning. arXiv preprint arXiv:2304.05341, 2023.
- Gaussian processes for machine learning, volume 1. Springer, 2006.
- Multitask prompted training enables zero-shot task generalization. In International Conference on Learning Representations, 2022.
- Large language models encode clinical knowledge. Nature, pages 1–9, 2023.
- Practical bayesian optimization of machine learning algorithms. Advances in neural information processing systems, 25, 2012.
- Scalable bayesian optimization using deep neural networks. In International Conference on Machine Learning, pages 2171–2180, 2015.
- Bayesian optimization with robust bayesian neural networks. In D. Lee, M. Sugiyama, U. Luxburg, I. Guyon, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 29. Curran Associates, Inc., 2016.
- Gaussian process optimization in the bandit setting: no regret and experimental design. In Proceedings of the 27th International Conference on International Conference on Machine Learning, pages 1015–1022, 2010.
- Density ratio estimation in machine learning. Cambridge University Press, 2012.
- Multi-task bayesian optimization. Advances in neural information processing systems, 26, 2013.
- Uber. Uber/bayesmark: Benchmark framework to easily compare bayesian optimization methods on real machine learning tasks, 2020.
- Laurens Van der Maaten and Geoffrey Hinton. Visualizing data using t-sne. Journal of machine learning research, 9(11), 2008.
- Generalizing from a few examples: A survey on few-shot learning. ACM computing surveys (csur), 53(3):1–34, 2020.
- S. Watanabe. Tree-structured Parzen estimator: Understanding its algorithm components and their roles for better empirical performance. arXiv preprint arXiv:2304.11127, 2023.
- Emergent abilities of large language models. arXiv preprint arXiv:2206.07682, 2022.
- Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35:24824–24837, 2022.
- Larger language models do in-context learning differently. arXiv preprint arXiv:2303.03846, 2023.
- Deep kernel learning. In Arthur Gretton and Christian C. Robert, editors, Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, volume 51 of Proceedings of Machine Learning Research, pages 370–378, Cadiz, Spain, 09–11 May 2016. PMLR.
- Few-shot bayesian optimization with deep kernel surrogates. In International Conference on Learning Representations, 2021.
- An explanation of in-context learning as implicit bayesian inference. In International Conference on Learning Representations, 2022.
- Large language models as optimizers. arXiv preprint arXiv:2309.03409, 2023.
- Calibrate before use: Improving few-shot performance of language models. In International Conference on Machine Learning, pages 12697–12706. PMLR, 2021.