BOHB: Robust and Efficient Hyperparameter Optimization at Scale (1807.01774v1)

Published 4 Jul 2018 in cs.LG and stat.ML

Abstract: Modern deep learning methods are very sensitive to many hyperparameters, and, due to the long training times of state-of-the-art models, vanilla Bayesian hyperparameter optimization is typically computationally infeasible. On the other hand, bandit-based configuration evaluation approaches based on random search lack guidance and do not converge to the best configurations as quickly. Here, we propose to combine the benefits of both Bayesian optimization and bandit-based methods, in order to achieve the best of both worlds: strong anytime performance and fast convergence to optimal configurations. We propose a new practical state-of-the-art hyperparameter optimization method, which consistently outperforms both Bayesian optimization and Hyperband on a wide range of problem types, including high-dimensional toy functions, support vector machines, feed-forward neural networks, Bayesian neural networks, deep reinforcement learning, and convolutional neural networks. Our method is robust and versatile, while at the same time being conceptually simple and easy to implement.

Citations (1,005)

View on Semantic Scholar

Summary

The paper presents a hybrid method that fuses Bayesian optimization with Hyperband to deliver robust and efficient hyperparameter tuning.
It leverages the Tree Parzen Estimator for effective model-based sampling and uses successive halving to optimally allocate computational resources.
Empirical results show BOHB achieving up to 100x faster convergence and better performance than traditional HPO methods across various machine learning tasks.

BOHB: Robust and Efficient Hyperparameter Optimization at Scale

The paper "BOHB: Robust and Efficient Hyperparameter Optimization at Scale" presents a novel method for hyperparameter optimization (HPO) in machine learning. It aims to address several desiderata that are essential for practical and efficient HPO: strong anytime performance, strong final performance, effective use of parallel resources, scalability, and robustness across different types of hyperparameter optimization problems.

Methodology

To achieve these objectives, the authors propose BOHB, a method that combines the strengths of Bayesian Optimization (BO) and Hyperband (HB). Traditional BO methods offer strong final performance but suffer from scalability issues, particularly in high-dimensional spaces and when handling mixed types of hyperparameters. On the other hand, Hyperband provides effective utilization of parallel resources and strong anytime performance but lacks in final performance due to its reliance on random search.

Bayesian Optimization and Hyperband

BOHB leverages the Tree Parzen Estimator (TPE) for its Bayesian optimization component, which is particularly known for its robustness and scalability compared to Gaussian processes (GPs). The TPE models the densities of good and bad performance configurations and uses this model to propose new configurations that are likely to perform well.

Hyperband is utilized for its bandit-based approach to budget allocation, using Successive Halving (SH) to iteratively allocate more resources to configurations that show promise, thereby efficiently using computational budgets.

Combination of BO and Hyperband

BOHB modifies Hyperband by replacing the random sampling of configurations with model-guided sampling. Specifically, it builds models based on the TPE to sample configurations in a way that balances exploration and exploitation effectively. This new method begins with random configurations but quickly shifts to model-based sampling as more data becomes available.

The paper provides a thorough algorithmic description, explaining how BOHB maintains simplicity, computational efficiency, and ease of implementation. A notable feature is its ability to handle mixed types of hyperparameters and make effective use of parallel computing resources, ensuring scalability.

Empirical Evaluation

The empirical evaluation spans diverse tasks, including high-dimensional synthetic benchmarks, feed-forward neural networks on OpenML datasets, support vector machines, Bayesian neural networks, reinforcement learning agents, and convolutional neural networks. BOHB consistently outperforms traditional methods like RS, TPE, and HB, achieving faster convergence to optimal configurations.

Significant results include:

On high-dimensional spaces like the Counting Ones problem, BOHB found optimal solutions significantly faster than TPE and HB.
For hyperparameters of feed-forward neural networks, BOHB showed improvement by a factor of 100 over HB in terms of finding configurations with the best validation performance.
In reinforcement learning tasks, BOHB demonstrated substantial improvements in the convergence speed and final performance compared to TPE and HB.

Implications and Future Directions

BOHB's ability to integrate Bayesian modeling with bandit-based resource allocation addresses the practical challenges faced in training large, state-of-the-art models efficiently. From a theoretical perspective, it introduces a sound methodology that combines the benefits of different HPO strategies, highlighting the synergy between model-based optimization and bandit strategies.

Practically, BOHB can be utilized in various machine learning domains requiring extensive hyperparameter searches, including but not limited to, neural architecture searches, tuning machine learning pipelines, and optimizing learning algorithms for specific tasks.

Future research directions may explore automatic adaptation of budget allocations and further enhancements in model-based optimization techniques to improve the robustness and efficiency of BOHB in even more diversified and complex settings.

Overall, the paper introduces an efficient, robust, and scalable HPO method, backed by comprehensive empirical evaluations, making it a valuable contribution to the field of machine learning optimization. Its open-source availability ensures that BOHB can be readily adopted and extended by the research community.

PDF Markdown