Fast Bayesian Optimization of Machine Learning Hyperparameters on Large Datasets (1605.07079v2)

Published 23 May 2016 in cs.LG, cs.AI, and stat.ML

Abstract: Bayesian optimization has become a successful tool for hyperparameter optimization of machine learning algorithms, such as support vector machines or deep neural networks. Despite its success, for large datasets, training and validating a single configuration often takes hours, days, or even weeks, which limits the achievable performance. To accelerate hyperparameter optimization, we propose a generative model for the validation error as a function of training set size, which is learned during the optimization process and allows exploration of preliminary configurations on small subsets, by extrapolating to the full dataset. We construct a Bayesian optimization procedure, dubbed Fabolas, which models loss and training time as a function of dataset size and automatically trades off high information gain about the global optimum against computational cost. Experiments optimizing support vector machines and deep neural networks show that Fabolas often finds high-quality solutions 10 to 100 times faster than other state-of-the-art Bayesian optimization methods or the recently proposed bandit strategy Hyperband.

Citations (521)

View on Semantic Scholar

Summary

The paper introduces FABOLAS, which enhances Bayesian optimization by incorporating dataset size to cut computation time while ensuring accurate hyperparameter tuning.
It leverages Gaussian Processes with a novel kernel to predict both loss and computational cost, achieving speedups of 10 to 100 times over traditional methods.
Experiments on SVMs, CNNs, and residual networks confirm the method’s efficiency and scalability for large-scale machine learning applications.

Fast Bayesian Optimization of Machine Learning Hyperparameters on Large Datasets

This paper introduces a method to accelerate Bayesian optimization for hyperparameter tuning, specifically targeting scenarios involving large datasets where training times can be prohibitive. The paper leverages a generative model to predict validation error based on training set size, facilitating exploration of hyperparameter configurations using smaller subsets of data. This approach aims to maintain the integrity of findings extrapolated to full dataset sizes while significantly reducing the computation time involved in hyperparameter optimization.

Methodology

The proposed method enhances traditional Bayesian optimization by introducing an environmental variable that represents the dataset size. This allows leveraging insights from smaller subsets to inform decisions about the complete dataset, improving efficiency. The Bayesian optimization procedure, named FABOLAS, integrates models predicting both loss and computational cost across dataset sizes. FABOLAS dynamically balances the information gain against computational costs to optimize configurations more rapidly than state-of-the-art methods.

Key Components

Models: Gaussian Processes (GPs) are employed to forecast losses and computational costs, incorporating a novel kernel that adapts to the data size. This enables extrapolation from smaller datasets while predicting outcomes for full data without direct evaluation.
Acquisition Function: FABOLAS utilizes an acquisition function inspired by Entropy Search (ES), considering both the potential information gain and the time cost associated with evaluations.
Initial Design: A strategic initial design is employed that biases the selection towards smaller, cheaper datasets to establish foundational modeling insights efficiently.

Experiments and Results

The paper conducts extensive experiments using support vector machines (SVMs) and deep neural networks, demonstrating substantial efficiency gains over existing methodologies:

For SVMs on various datasets, FABOLAS achieved effective hyperparameter tuning 10 to 100 times faster than traditional Bayesian optimization and Hyperband.
When applied to convolutional neural networks on CIFAR-10 and SVHN, FABOLAS maintained a performance edge, largely attributed to its ability to adapt quickly to dataset size changes.
For complex benchmarks like residual networks, FABOLAS continued to outperform other methods significantly, underlining its applicability to complex deep learning models.

Implications and Future Work

FABOLAS presents a potent advancement in hyperparameter optimization, especially as dataset sizes continue to grow. Its ability to simulate expert-like exploration patterns grants it robustness and efficiency, suggesting broad applicability across machine learning domains. Potential future avenues include extending the approach to other environmental variables, like image resolution or class numbers, and employing alternative models such as Bayesian neural networks to mitigate the cubic scaling constraints of Gaussian Processes.

The contribution of this research lies in its innovative approach to data-size-aware optimization, offering a scalable and efficient solution for tackling the computational challenges inherent in hyperparameter tuning for large-scale machine learning applications.

PDF Markdown