Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
51 tokens/sec
GPT-4o
60 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Using Large Language Models for Hyperparameter Optimization (2312.04528v2)

Published 7 Dec 2023 in cs.LG and cs.AI
Using Large Language Models for Hyperparameter Optimization

Abstract: This paper explores the use of foundational LLMs in hyperparameter optimization (HPO). Hyperparameters are critical in determining the effectiveness of machine learning models, yet their optimization often relies on manual approaches in limited-budget settings. By prompting LLMs with dataset and model descriptions, we develop a methodology where LLMs suggest hyperparameter configurations, which are iteratively refined based on model performance. Our empirical evaluations on standard benchmarks reveal that within constrained search budgets, LLMs can match or outperform traditional HPO methods like Bayesian optimization across different models on standard benchmarks. Furthermore, we propose to treat the code specifying our model as a hyperparameter, which the LLM outputs and affords greater flexibility than existing HPO approaches.

Introduction

In machine learning, there are critical non-trainable settings called hyperparameters that influence the performance and effectiveness of models. Hyperparameters include choices such as the type of architecture, the degree of regularization, which optimization method to use, and others. Hyperparameter optimization (HPO) seeks the best set of these settings for a given problem. Traditional methods for HPO, like random search or Bayesian optimization, have limitations, notably in defining a search space and dealing with the initial phase of search. This paper explores the potential of leveraging LLMs for HPO.

Methodology

A novel approach is proposed where LLMs, specifically variants of GPT, are used for HPO tasks. LLMs are prompted to suggest hyperparameters and are provided with feedback on model performance, which informs subsequent suggestions. The process continues iteratively until a resource budget is reached. The paper also introduces the use of LLMs to generate complete training code, treating the code itself as an adaptable hyperparameter. This allows for more flexible exploration beyond a predefined hyperparameter space. Empirical evaluations compare LLMs to traditional HPO methods on standard benchmarks, assessing efficiency and performance over multiple iterations.

Results

The findings demonstrate that LLMs can suggest effective hyperparameters that lead to comparable or superior model performance than traditional optimization methods within a limited search budget. The paper offers analyses on both simple 2D toy problems to visualize the optimization process and realistic HPO benchmarks. When tasked to generate training code, LLMs produced results that were more successful than random search in early evaluations, indicating potential for reducing the search space effectively. Additionally, the paper observes the positive impact of 'chain-of-thought' prompting, where providing reasoning with responses slightly improved outcomes.

Conclusion and Related Work

LLMs show promise as hyperparameter tuning assistants, achieving good performance and efficiently navigating the search space. The paper acknowledges existing hyperparameter optimization techniques and their challenges, placing the LLM approach in the broader context of HPO research. While recognizing limitations such as the potential for dataset contamination and cost restrictions associated with using commercial LLMs, the authors are optimistic about LLMs becoming more versatile and cost-effective for HPO applications in the future.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (64)
  1. D Ackley. A connectionist machine for genetic hillclimbing, 1987.
  2. Delta-stn: Efficient bilevel optimization for neural networks using structured response jacobians. Advances in Neural Information Processing Systems, 33:21725–21737, 2020.
  3. Multi-rate vae: Train once, get the full rate-distortion curve. In The Eleventh International Conference on Learning Representations, 2022.
  4. Ae: A domain-agnostic platform for adaptive experimentation. In Conference on neural information processing systems, pages 1–8, 2018.
  5. Botorch: A framework for efficient monte-carlo bayesian optimization. Advances in neural information processing systems, 33:21524–21538, 2020.
  6. Random search for hyper-parameter optimization. Journal of machine learning research, 13(2), 2012.
  7. Hyperopt: A python library for optimizing the hyperparameters of machine learning algorithms. In Proceedings of the 12th Python in science conference, volume 13, page 20. Citeseer, 2013.
  8. Hyperparameter optimization: Foundations, algorithms, best practices, and open challenges. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 13(2):e1484, 2023.
  9. Samuel R Bowman. Eight things to know about large language models. arXiv preprint arXiv:2304.00612, 2023.
  10. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
  11. Evoprompting: Language models for code-level neural architecture search. arXiv preprint arXiv:2302.14838, 2023a.
  12. Extending context window of large language models via positional interpolation. arXiv preprint arXiv:2306.15595, 2023b.
  13. Towards learning universal hyperparameter optimizers with transformers. Advances in Neural Information Processing Systems, 35:32053–32068, 2022.
  14. Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality, March 2023. URL https://lmsys.org/blog/2023-03-30-vicuna/.
  15. Multi-objective bayesian optimization over high-dimensional search spaces. In Uncertainty in Artificial Intelligence, pages 507–517. PMLR, 2022.
  16. Laurence Charles Ward Dixon. The global optimization problem: an introduction. Towards Global Optimiation 2, pages 1–15, 1978.
  17. Hpobench: A collection of reproducible multi-fidelity benchmark problems for hpo. arXiv preprint arXiv:2109.06716, 2021.
  18. Hyperparameter optimization. Automated machine learning: Methods, systems, challenges, pages 3–33, 2019.
  19. Forward and reverse gradient-based hyperparameter optimization. In International Conference on Machine Learning, pages 1165–1173. PMLR, 2017.
  20. Kriging is well-suited to parallelize optimization. In Computational intelligence in expensive optimization problems, pages 131–162. Springer, 2010.
  21. Google vizier: A service for black-box optimization. In Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1487–1495, 2017.
  22. Walid Hariri. Unlocking the potential of chatgpt: A comprehensive exploration of its applications, advantages, limitations, and future directions in natural language processing. arXiv preprint arXiv:2304.02017, 2023.
  23. David M Himmelblau et al. Applied nonlinear programming. McGraw-Hill, 2018.
  24. Gpt for semi-automated data science: Introducing caafe for context-aware automated feature engineering. arXiv preprint arXiv:2305.03403, 2023.
  25. Sequential model-based optimization for general algorithm configuration. In Learning and Intelligent Optimization: 5th International Conference, LION 5, Rome, Italy, January 17-21, 2011. Selected Papers 5, pages 507–523. Springer, 2011.
  26. Population based training of neural networks. arXiv preprint arXiv:1711.09846, 2017.
  27. Non-stochastic best arm identification and hyperparameter optimization. In Artificial intelligence and statistics, pages 240–248. PMLR, 2016.
  28. Survey of hallucination in natural language generation. ACM Computing Surveys, 55(12):1–38, 2023.
  29. Tuning hyperparameters without grad students: Scalable and robust bayesian optimisation with dragonfly. The Journal of Machine Learning Research, 21(1):3098–3124, 2020.
  30. Fast bayesian optimization of machine learning hyperparameters on large datasets. In Artificial intelligence and statistics, pages 528–536. PMLR, 2017.
  31. Large language models are zero-shot reasoners. Advances in neural information processing systems, 35:22199–22213, 2022.
  32. Bayesian optimization with a finite budget: An approximate dynamic programming approach. Advances in Neural Information Processing Systems, 29, 2016.
  33. Hyperband: A novel bandit-based approach to hyperparameter optimization. The journal of machine learning research, 18(1):6765–6816, 2017.
  34. Towards assessing the impact of bayesian optimization’s own hyperparameters. arXiv preprint arXiv:1908.06674, 2019.
  35. Smac3: A versatile bayesian optimization package for hyperparameter optimization. The Journal of Machine Learning Research, 23(1):2475–2483, 2022.
  36. Summary of chatgpt-related research and perspective towards the future of large language models. Meta-Radiology, page 100017, 2023.
  37. Stochastic hyperparameter optimization through hypernetworks. arXiv preprint arXiv:1802.09419, 2018.
  38. Optimizing millions of hyperparameters by implicit differentiation. In International conference on artificial intelligence and statistics, pages 1540–1552. PMLR, 2020.
  39. Task selection for automl system evaluation. arXiv preprint arXiv:2208.12754, 2022.
  40. Self-tuning networks: Bilevel optimization of hyperparameters using structured best-response functions. arXiv preprint arXiv:1903.03088, 2019.
  41. Gradient-based hyperparameter optimization through reversible learning. In International conference on machine learning, pages 2113–2122. PMLR, 2015.
  42. Jonas Mockus. The application of bayesian methods for seeking the extremum. Towards global optimization, 2:117, 1998.
  43. Show your work: Scratchpads for intermediate computation with language models. arXiv preprint arXiv:2112.00114, 2021.
  44. OpenAI. Gpt-4 technical report, 2023.
  45. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830, 2011.
  46. Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
  47. Meta-learning to improve pre-training. Advances in Neural Information Processing Systems, 34:23231–23244, 2021.
  48. HoHo Rosenbrock. An automatic method for finding the greatest or least value of a function. The computer journal, 3(3):175–184, 1960.
  49. Taking the human out of the loop: A review of bayesian optimization. Proceedings of the IEEE, 104(1):148–175, 2015.
  50. Practical bayesian optimization of machine learning algorithms. Advances in neural information processing systems, 25, 2012.
  51. Prioritized architecture sampling with monto-carlo tree search. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10968–10977, 2021.
  52. Multi-task bayesian optimization. Advances in neural information processing systems, 26, 2013.
  53. Stanford alpaca: An instruction-following llama model. https://github.com/tatsu-lab/stanford_alpaca, 2023.
  54. NYC Taxi Trip Records from JAN 2023 to JUN 2023, 2023a. URL https://www.kaggle.com/datasets/nagasai524/nyc-taxi-trip-records-from-jan-2023-to-jun-2023.
  55. TLC Trip Record Data - TLC, 2023b. URL https://www.nyc.gov/site/tlc/about/tlc-trip-record-data.page.
  56. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023a.
  57. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023b.
  58. On implicit bias in overparameterized bilevel optimization. In International Conference on Machine Learning, pages 22234–22259. PMLR, 2022.
  59. Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35:24824–24837, 2022.
  60. Large language models as optimizers. arXiv preprint arXiv:2309.03409, 2023.
  61. Li Yang and Abdallah Shami. On hyperparameter optimization of machine learning algorithms: Theory and practice. Neurocomputing, 415:295–316, 2020.
  62. Automl-gpt: Automatic machine learning with gpt. arXiv preprint arXiv:2305.02499, 2023.
  63. Can gpt-4 perform neural architecture search? arXiv preprint arXiv:2304.10970, 2023.
  64. Large language models are human-level prompt engineers. In The Eleventh International Conference on Learning Representations, 2022.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Michael R. Zhang (13 papers)
  2. Nishkrit Desai (2 papers)
  3. Juhan Bae (20 papers)
  4. Jonathan Lorraine (20 papers)
  5. Jimmy Ba (55 papers)
Citations (31)
X Twitter Logo Streamline Icon: https://streamlinehq.com