Large Language Model Agent for Hyper-Parameter Optimization (2402.01881v2)
Abstract: Hyperparameter optimization is critical in modern machine learning, requiring expert knowledge, numerous trials, and high computational and human resources. Despite the advancements in Automated Machine Learning (AutoML), challenges in terms of trial efficiency, setup complexity, and interoperability still persist. To address these issues, we introduce a novel paradigm leveraging LLMs to automate hyperparameter optimization across diverse machine learning tasks, which is named AgentHPO (short for LLM Agent-based Hyperparameter Optimization). Specifically, AgentHPO processes the task information autonomously, conducts experiments with specific hyperparameters (HPs), and iteratively optimizes them based on historical trials. This human-like optimization process largely reduces the number of required trials, simplifies the setup process, and enhances interpretability and user trust, compared to traditional AutoML methods. Extensive empirical experiments conducted on 12 representative machine-learning tasks indicate that AgentHPO not only matches but also often surpasses the best human trials in terms of performance while simultaneously providing explainable results. Further analysis sheds light on the strategies employed by the LLM in optimizing these tasks, highlighting its effectiveness and adaptability in various scenarios.
- Random search for hyper-parameter optimization. Journal of machine learning research, 13(2), 2012.
- Autonomous chemical research with large language models. Nature, 624(7992):570–578, 2023.
- Sparks of artificial general intelligence: Early experiments with gpt-4. arXiv preprint arXiv:2303.12712, 2023.
- Evoprompting: Language models for code-level neural architecture search. In Thirty-seventh Conference on Neural Information Processing Systems, 2023. URL https://openreview.net/forum?id=ifbF4WdT8f.
- Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, pp. 785–794, 2016.
- The cityscapes dataset for semantic urban scene understanding. In Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
- DePie. Butterfly image classification, Jun 2023. URL https://www.kaggle.com/datasets/phucthaiv02/butterfly-image-classification.
- Hpobench: A collection of reproducible multi-fidelity benchmark problems for hpo. In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2), 2021.
- S 33{}^{3}start_FLOATSUPERSCRIPT 3 end_FLOATSUPERSCRIPT: Social-network simulation system with large language model-empowered agents. arXiv preprint arXiv:2307.14984, 2023.
- Explaining explanations: An overview of interpretability of machine learning. In 2018 IEEE 5th International Conference on data science and advanced analytics (DSAA), pp. 80–89. IEEE, 2018.
- Deepfm: a factorization-machine based neural network for ctr prediction. In Proceedings of the 26th International Joint Conference on Artificial Intelligence, pp. 1725–1731, 2017.
- The movielens datasets: History and context. ACM Trans. Interact. Intell. Syst., 5(4), dec 2015. ISSN 2160-6455. doi: 10.1145/2827872. URL https://doi.org/10.1145/2827872.
- Why do machine learning practitioners still use manual tuning? a qualitative study. arXiv preprint arXiv:2203.01717, 2022.
- Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778, 2016.
- Lightgcn: Simplifying and powering graph convolution network for recommendation. In Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval, pp. 639–648, 2020.
- Large language models for automated data science: Introducing caafe for context-aware automated feature engineering. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
- Metagpt: Meta programming for multi-agent collaborative framework. arXiv preprint arXiv:2308.00352, 2023.
- Benchmarking large language models as ai research agents. arXiv preprint arXiv:2310.03302, 2023.
- Automated machine learning: methods, systems, challenges. Springer Nature, 2019.
- Imran, M. B. Housing price prediction data, Nov 2023. URL https://www.kaggle.com/datasets/muhammadbinimran/housing-price-prediction-data.
- Semi-supervised classification with graph convolutional networks. In International Conference on Learning Representations, 2016a.
- Variational graph auto-encoders. NIPS Workshop on Bayesian Deep Learning, 2016b.
- Learning multiple layers of features from tiny images. 2009.
- Langley, P. Crafting papers on machine learning. In Langley, P. (ed.), Proceedings of the 17th International Conference on Machine Learning (ICML 2000), pp. 1207–1216, Stanford, CA, 2000. Morgan Kaufmann.
- Deep learning. nature, 521(7553):436–444, 2015.
- An empirical study of the impact of hyperparameter tuning and model optimization on the performance properties of deep neural networks. ACM Transactions on Software Engineering and Methodology (TOSEM), 31(3):1–40, 2022.
- Priorband: Practical hyperparameter optimization in the age of deep learning. In Thirty-seventh Conference on Neural Information Processing Systems, 2023. URL https://openreview.net/forum?id=uoiwugtpCH.
- On the state of the art of evaluation in neural language models. In International Conference on Learning Representations, 2018.
- Enhancing explainability of hyperparameter optimization via bayesian algorithm execution. arXiv preprint arXiv:2206.05447, 2022.
- OpenAI. Gpt-4 technical report, 2023.
- Generative agents: Interactive simulacra of human behavior. In Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology, pp. 1–22, 2023.
- Enet: A deep neural network architecture for real-time semantic segmentation. arXiv preprint arXiv:1606.02147, 2016.
- Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research, 21(140):1–67, 2020. URL http://jmlr.org/papers/v21/20-074.html.
- Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter. ArXiv, abs/1910.01108, 2019.
- Collective classification in network data. AI magazine, 29(3):93–93, 2008.
- Shahane, S. Ecommerce text classification, Oct 2023. URL https://www.kaggle.com/datasets/saurabhshahane/ecommerce-text-classification/data.
- Taking the human out of the loop: A review of bayesian optimization. Proceedings of the IEEE, 104(1):148–175, 2015.
- Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 1631–1642, Seattle, Washington, USA, October 2013. Association for Computational Linguistics. URL https://www.aclweb.org/anthology/D13-1170.
- Tharmalingam, L. Water quality and potability, Sep 2023. URL https://www.kaggle.com/datasets/uom190346a/water-quality-and-potability.
- Automl in the age of large language models: Current challenges, future opportunities and risks. arXiv preprint arXiv:2306.08107, 2023.
- Voyager: An open-ended embodied agent with large language models. arXiv preprint arXiv:2305.16291, 2023.
- Unleashing the power of graph learning through llm-based autonomous agents. arXiv preprint arXiv:2309.04565, 2023.
- Hyperparameter search space pruning–a new component for sequential model-based hyperparameter optimization. In Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2015, Porto, Portugal, September 7-11, 2015, Proceedings, Part II 15, pp. 104–119. Springer, 2015.
- On hyperparameter optimization of machine learning algorithms: Theory and practice. Neurocomputing, 415:295–316, 2020.
- Hyper-parameter optimization: A review of algorithms and applications. arXiv preprint arXiv:2003.05689, 2020.
- Improving massively multilingual neural machine translation and zero-shot translation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 1628–1639, 2020.
- Mlcopilot: Unleashing the power of large language models in solving machine learning tasks. arXiv preprint arXiv:2304.14979, 2023a.
- Automl-gpt: Automatic machine learning with gpt. arXiv preprint arXiv:2305.02499, 2023b.