Using Large Language Models for Hyperparameter Optimization (2312.04528v2)

Published 7 Dec 2023 in cs.LG and cs.AI

Abstract: This paper explores the use of foundational LLMs in hyperparameter optimization (HPO). Hyperparameters are critical in determining the effectiveness of machine learning models, yet their optimization often relies on manual approaches in limited-budget settings. By prompting LLMs with dataset and model descriptions, we develop a methodology where LLMs suggest hyperparameter configurations, which are iteratively refined based on model performance. Our empirical evaluations on standard benchmarks reveal that within constrained search budgets, LLMs can match or outperform traditional HPO methods like Bayesian optimization across different models on standard benchmarks. Furthermore, we propose to treat the code specifying our model as a hyperparameter, which the LLM outputs and affords greater flexibility than existing HPO approaches.

PDF HTML Abstract

Introduction

In machine learning, there are critical non-trainable settings called hyperparameters that influence the performance and effectiveness of models. Hyperparameters include choices such as the type of architecture, the degree of regularization, which optimization method to use, and others. Hyperparameter optimization (HPO) seeks the best set of these settings for a given problem. Traditional methods for HPO, like random search or Bayesian optimization, have limitations, notably in defining a search space and dealing with the initial phase of search. This paper explores the potential of leveraging LLMs for HPO.

Methodology

A novel approach is proposed where LLMs, specifically variants of GPT, are used for HPO tasks. LLMs are prompted to suggest hyperparameters and are provided with feedback on model performance, which informs subsequent suggestions. The process continues iteratively until a resource budget is reached. The paper also introduces the use of LLMs to generate complete training code, treating the code itself as an adaptable hyperparameter. This allows for more flexible exploration beyond a predefined hyperparameter space. Empirical evaluations compare LLMs to traditional HPO methods on standard benchmarks, assessing efficiency and performance over multiple iterations.

Results

The findings demonstrate that LLMs can suggest effective hyperparameters that lead to comparable or superior model performance than traditional optimization methods within a limited search budget. The paper offers analyses on both simple 2D toy problems to visualize the optimization process and realistic HPO benchmarks. When tasked to generate training code, LLMs produced results that were more successful than random search in early evaluations, indicating potential for reducing the search space effectively. Additionally, the paper observes the positive impact of 'chain-of-thought' prompting, where providing reasoning with responses slightly improved outcomes.

Conclusion and Related Work

LLMs show promise as hyperparameter tuning assistants, achieving good performance and efficiently navigating the search space. The paper acknowledges existing hyperparameter optimization techniques and their challenges, placing the LLM approach in the broader context of HPO research. While recognizing limitations such as the potential for dataset contamination and cost restrictions associated with using commercial LLMs, the authors are optimistic about LLMs becoming more versatile and cost-effective for HPO applications in the future.

PDF Markdown Bookmark Chat (Pro)

References (64)

Authors (5)

Michael R. Zhang (13 papers)
Nishkrit Desai (2 papers)
Juhan Bae (20 papers)
Jonathan Lorraine (20 papers)
Jimmy Ba (55 papers)

Citations (31)

View on Semantic Scholar

Tweets

https://twitter.com/899756744561246208/status/1733226841886908789

Using Large Language Models for Hyperparameter Optimization, Zhang et al. 2023 [GPT-4 is quite good at finding the optimal hyperparameters for machine learning tasks] (49 points, 9 comments)