Large Language Models Synergize with Automated Machine Learning (2405.03727v3)
Abstract: Recently, program synthesis driven by LLMs has become increasingly popular. However, program synthesis for ML tasks still poses significant challenges. This paper explores a novel form of program synthesis, targeting ML programs, by combining LLMs and automated machine learning (autoML). Specifically, our goal is to fully automate the generation and optimization of the code of the entire ML workflow, from data preparation to modeling and post-processing, utilizing only textual descriptions of the ML tasks. To manage the length and diversity of ML programs, we propose to break each ML program into smaller, manageable parts. Each part is generated separately by the LLM, with careful consideration of their compatibilities. To ensure compatibilities, we design a testing technique for ML programs. Unlike traditional program synthesis, which typically relies on binary evaluations (i.e., correct or incorrect), evaluating ML programs necessitates more than just binary judgments. Our approach automates the numerical evaluation and optimization of these programs, selecting the best candidates through autoML techniques. In experiments across various ML tasks, our method outperforms existing methods in 10 out of 12 tasks for generating ML programs. In addition, autoML significantly improves the performance of the generated ML programs. In experiments, given the textual task description, our method, Text-to-ML, generates the complete and optimized ML program in a fully autonomous process. The implementation of our method is available at https://github.com/JLX0/LLM-automl.
- Gpt-4 technical report. arXiv preprint arXiv:2303.08774, 2023.
- Altavish. Boston housing dataset. https://www.kaggle.com/datasets/altavish/boston-housing-dataset, 2023.
- Neural module networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 39–48, 2016.
- Exploring length generalization in large language models. Advances in Neural Information Processing Systems, 35:38546–38556, 2022.
- Palm 2 technical report. arXiv preprint arXiv:2305.10403, 2023.
- Program synthesis with large language models. arXiv preprint arXiv:2108.07732, 2021.
- Random search for hyper-parameter optimization. Journal of machine learning research, 13(2), 2012.
- On testing machine learning programs. Journal of Systems and Software, 164:110542, 2020.
- Al: autogenerating supervised learning programs. Proceedings of the ACM on Programming Languages, 3(OOPSLA):1–28, 2019.
- Evoprompting: Language models for code-level neural architecture search. arXiv preprint arXiv:2302.14838, 2023a.
- Codet: Code generation with generated tests. arXiv preprint arXiv:2207.10397, 2022.
- Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374, 2021.
- Teaching large language models to self-debug. arXiv preprint arXiv:2304.05128, 2023b.
- Training verifiers to solve math word problems. arXiv preprint arXiv:2110.14168, 2021.
- Brp-nas: Prediction-based nas using gcns. Advances in Neural Information Processing Systems, 33:10480–10490, 2020.
- Bohb: Robust and efficient hyperparameter optimization at scale. In International conference on machine learning, pp. 1437–1446. PMLR, 2018.
- Muffin: Testing deep learning libraries via neural architecture fuzzing. In Proceedings of the 44th International Conference on Software Engineering, pp. 1418–1430, 2022.
- Visual programming: Compositional visual reasoning without training. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14953–14962, 2023.
- Large language models for automated data science: Introducing caafe for context-aware automated feature engineering. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
- Litetransformersearch: Training-free neural architecture search for efficient language models. Advances in Neural Information Processing Systems, 35:24254–24267, 2022.
- Kaggle. Kaggle - llm science exam competition data. https://www.kaggle.com/competitions/kaggle-llm-science-exam/data, 2023.
- Kaggle and Mayo Clinic. Mayo clinic - strip ai competition data. https://www.kaggle.com/competitions/mayo-clinic-strip-ai/data, 2023.
- Kaggle and American Express. American express - default prediction competition data. https://www.kaggle.com/competitions/amex-default-prediction/data, 2023.
- Kaggle and Happywhale. Happywhale - whale and dolphin identification competition data. https://www.kaggle.com/competitions/happy-whale-and-dolphin/data, 2023.
- Icr - identifying age-related conditions competition data. https://www.kaggle.com/competitions/icr-identify-age-related-conditions/data, 2023.
- Feedback prize - english language learning competition data. https://www.kaggle.com/competitions/feedback-prize-english-language-learning/data, 2023.
- Guiding deep learning system testing using surprise adequacy. In 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE), pp. 1039–1049. IEEE, 2019.
- Large language models are zero-shot reasoners. Advances in neural information processing systems, 35:22199–22213, 2022.
- Nas-bench-suite-zero: Accelerating research on zero cost proxies. Advances in Neural Information Processing Systems, 35:28037–28051, 2022.
- Alex Krizhevsky et al. Cifar-10 and cifar-100 datasets. https://www.cs.toronto.edu/~kriz/cifar.html, 2023.
- Lakshmi25npathi. Imdb dataset of 50k movie reviews. https://www.kaggle.com/datasets/lakshmi25npathi/imdb-dataset-of-50k-movie-reviews, 2023.
- Codechain: Towards modular code generation through chain of self-revisions with representative sub-modules. arXiv preprint arXiv:2310.08992, 2023.
- UCI Machine Learning. Iris species. https://www.kaggle.com/datasets/uciml/iris, 2023.
- Hyperband: A novel bandit-based approach to hyperparameter optimization. Journal of Machine Learning Research, 18(185):1–52, 2018.
- Competition-level code generation with alphacode. Science, 378(6624):1092–1097, 2022.
- Darts: Differentiable architecture search. arXiv preprint arXiv:1806.09055, 2018.
- Chameleon: Plug-and-play compositional reasoning with large language models. arXiv preprint arXiv:2304.09842, 2023.
- Self-refine: Iterative refinement with self-feedback. arXiv preprint arXiv:2303.17651, 2023.
- Toward automatic program synthesis. Communications of the ACM, 14(3):151–165, 1971.
- Neural architecture search without training. In International Conference on Machine Learning, pp. 7588–7598. PMLR, 2021.
- Rethinking the role of demonstrations: What makes in-context learning work? arXiv preprint arXiv:2202.12837, 2022.
- The art of software testing, volume 2. Wiley Online Library, 2004.
- The eos decision and length extrapolation. arXiv preprint arXiv:2010.07174, 2020.
- Zero-shot automl with pretrained models. In International Conference on Machine Learning, pp. 17138–17155. PMLR, 2022.
- Deepxplore: Automated whitebox testing of deep learning systems. In proceedings of the 26th Symposium on Operating Systems Principles, pp. 1–18, 2017.
- Efficient neural architecture search via parameters sharing. In International conference on machine learning, pp. 4095–4104. PMLR, 2018.
- Lutz Prechelt. Early stopping-but when? In Neural Networks: Tricks of the trade, pp. 55–69. Springer, 2002.
- Aman Anand Rai. Ag news classification dataset. https://www.kaggle.com/datasets/amananandrai/ag-news-classification-dataset, 2023.
- Automl-zero: Evolving machine learning algorithms from scratch. In International conference on machine learning, pp. 8007–8019. PMLR, 2020.
- Sapientml: synthesizing machine learning pipelines by learning from human-writen solutions. In Proceedings of the 44th International Conference on Software Engineering, pp. 1932–1944, 2022.
- Hugginggpt: Solving ai tasks with chatgpt and its friends in huggingface. arXiv preprint arXiv:2303.17580, 2023.
- Reflexion: Language agents with verbal reinforcement learning. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
- Vipergpt: Visual inference via python execution for reasoning. arXiv preprint arXiv:2303.08128, 2023.
- Sequence to sequence learning with neural networks. Advances in neural information processing systems, 27, 2014.
- Pruning neural networks without any data by iteratively conserving synaptic flow. Advances in neural information processing systems, 33:6377–6389, 2020.
- Auto-weka: Combined selection and hyperparameter optimization of classification algorithms. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 847–855, 2013.
- Deeptest: Automated testing of deep-neural-network-driven autonomous cars. In Proceedings of the 40th international conference on software engineering, pp. 303–314, 2018.
- Automl in the age of large language models: Current challenges, future opportunities and risks. arXiv preprint arXiv:2306.08107, 2023.
- Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023.
- Automatic unit test generation for machine learning libraries: How far are we? In 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE), pp. 1548–1560. IEEE, 2021.
- Self-consistency improves chain of thought reasoning in language models. arXiv preprint arXiv:2203.11171, 2022.
- Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35:24824–24837, 2022.
- React: Synergizing reasoning and acting in language models. arXiv preprint arXiv:2210.03629, 2022.
- Tree of thoughts: Deliberate problem solving with large language models. arXiv preprint arXiv:2305.10601, 2023.
- Gpt-nas: Neural architecture search with the generative pre-trained model. arXiv preprint arXiv:2305.05351, 2023.
- Mlcopilot: Unleashing the power of large language models in solving machine learning tasks. arXiv preprint arXiv:2304.14979, 2023a.
- Automl-gpt: Automatic machine learning with gpt. arXiv preprint arXiv:2305.02499, 2023b.
- Can gpt-4 perform neural architecture search? arXiv preprint arXiv:2304.10970, 2023.
- Least-to-most prompting enables complex reasoning in large language models. arXiv preprint arXiv:2205.10625, 2022a.
- Training-free transformer architecture search. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10894–10903, 2022b.
- Neural architecture search with reinforcement learning. arXiv preprint arXiv:1611.01578, 2016.
- Jinglue Xu (4 papers)
- Zhen Liu (234 papers)
- Nagar Anthel Venkatesh Suryanarayanan (1 paper)
- Hitoshi Iba (8 papers)
- Jialong Li (36 papers)
- Guoyuan Zhou (3 papers)
- Jia Guo (101 papers)
- Kenji Tei (15 papers)