Pre-trained Gaussian Processes for Bayesian Optimization (2109.08215v6)

Published 16 Sep 2021 in cs.LG and stat.ML

Abstract: Bayesian optimization (BO) has become a popular strategy for global optimization of expensive real-world functions. Contrary to a common expectation that BO is suited to optimizing black-box functions, it actually requires domain knowledge about those functions to deploy BO successfully. Such domain knowledge often manifests in Gaussian process (GP) priors that specify initial beliefs on functions. However, even with expert knowledge, it is non-trivial to quantitatively define a prior. This is especially true for hyperparameter tuning problems on complex machine learning models, where landscapes of tuning objectives are often difficult to comprehend. We seek an alternative practice for setting these functional priors. In particular, we consider the scenario where we have data from similar functions that allow us to pre-train a tighter distribution a priori. We detail what pre-training entails for GPs using a KL divergence based loss function, and propose a new pre-training based BO framework named HyperBO. Theoretically, we show bounded posterior predictions and near-zero regrets for HyperBO without assuming the "ground truth" GP prior is known. To verify our approach in realistic setups, we collect a large multi-task hyperparameter tuning dataset by training tens of thousands of configurations of near-state-of-the-art deep learning models on popular image and text datasets, as well as a protein sequence dataset. Our results show that on average, HyperBO is able to locate good hyperparameters at least 3 times more efficiently than the best competing methods on both our new tuning dataset and existing multi-task BO benchmarks.

PDF Abstract

An Overview of Pre-trained Gaussian Processes for Bayesian Optimization

The paper "Pre-trained Gaussian Processes for Bayesian Optimization" addresses a significant aspect of optimizing complex functions in real-world scenarios where traditional methods may fall short due to lack of domain-specific priors. The authors propose an alternative framework, HyperBO, which involves pre-training Gaussian Processes (GPs) using prior data from related tasks to effectively inform Bayesian Optimization (BO) processes without requiring manual specification of priors.

Core Contributions

The paper contributes to the field of Bayesian optimization by introducing HyperBO, a methodology that integrates pre-trained GPs to enhance the performance of BO in scenarios where hyperparameter tuning or other complex function optimization is required. The following are the core contributions:

Pre-training of Gaussian Processes: The authors propose a methodology to pre-train GPs using a Kullback-Leibler (KL) divergence-based loss function, which aligns the learned priors closely with the distributions of functions observed in prior tasks. This approach allows for more accurate prediction and optimization in new, yet related, tasks.
Theoretical Insights: The paper provides theoretical guarantees for HyperBO, demonstrating that the approach can achieve bounded posterior predictions and near-zero regrets even without prior knowledge of the GP's "ground truth" configurations. This theoretical framework is crucial for understanding the effectiveness and limitations of HyperBO in practice.
Large-Scale Hyperparameter Tuning Dataset: The authors created and utilized a large multi-task hyperparameter tuning dataset, PD1, consisting of tens of thousands of hyperparameter configuration evaluations across various model-dataset combinations. This dataset is pivotal for evaluating the empirical performance of HyperBO against other state-of-the-art methods in hyperparameter tuning.

Methodology

HyperBO is structured to first utilize a pre-trained GP, which incorporates prior knowledge from diverse but related tasks, to serve as an informed initial model for Bayesian optimization. The GP is composed of a mean function and a kernel, both of which are trained on a large dataset containing evaluations of similar functions. The training strategy employs a KL divergence-based loss function to effectively capture the prior belief distributions.

Empirical Evaluation

Extensive empirical evaluations were conducted using both the proprietary PD1 dataset and established benchmarks like HPO-B. The results indicate that HyperBO significantly outperforms baseline methods concerning the speed and quality of optimization achieved. On average, HyperBO achieved a threefold increase in efficiency in finding good hyperparameter configurations compared to traditional methods.

Practical and Theoretical Implications

Practical Applications: HyperBO provides a scalable and effective mechanism for hyperparameter tuning, particularly in deep learning contexts where model complexity and dataset scale can be major hurdles. It shifts the dependency from handcrafted priors to data-driven priors, thus broadening the usability of Bayesian optimization across different domains.
Theoretical Implications: The work advances theoretical understanding by establishing regret bounds in the absence of a known ground-truth GP prior. It highlights conditions under which pre-trained models can reliably substitute for true priors in Bayesian inference, thus setting a foundation for future studies to explore similar setups across different probabilistic models.

Future Directions

The paper opens several avenues for further research. Future work could extend HyperBO to handle dynamic and asynchronous task environments, investigate integration with non-GP surrogate models, or explore augmentation strategies for training datasets to increase the robustness of the learned priors. Additionally, scaling HyperBO to even larger task domains and refining its ability to generalize across diverse tasks remain promising directions.

In conclusion, the paper contributes a novel approach to Bayesian optimization by leveraging pre-trained Gaussian processes, providing both theoretical insights and practical gains in hyperparameter optimization. This methodology holds potential for broader applicability and sets the stage for evolving the role of BO in machine learning and beyond.

PDF Markdown Bookmark Chat (Pro)

Authors (8)

Zi Wang (120 papers)
George E. Dahl (27 papers)
Kevin Swersky (51 papers)
Chansoo Lee (18 papers)
Zachary Nado (23 papers)
Justin Gilmer (39 papers)
Jasper Snoek (42 papers)
Zoubin Ghahramani (108 papers)

Citations (31)

View on Semantic Scholar

Related Papers

Find Related Papers

GitHub

GitHub - google-research/hyperbo: Pre-trained Gaussian processes for Bayesian optimization (93 stars)
GitHub - google-research/gpax (30 stars)