Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

LLMatic: Neural Architecture Search via Large Language Models and Quality Diversity Optimization (2306.01102v8)

Published 1 Jun 2023 in cs.NE, cs.AI, and cs.CL

Abstract: LLMs have emerged as powerful tools capable of accomplishing a broad spectrum of tasks. Their abilities span numerous areas, and one area where they have made a significant impact is in the domain of code generation. Here, we propose using the coding abilities of LLMs to introduce meaningful variations to code defining neural networks. Meanwhile, Quality-Diversity (QD) algorithms are known to discover diverse and robust solutions. By merging the code-generating abilities of LLMs with the diversity and robustness of QD solutions, we introduce \texttt{LLMatic}, a Neural Architecture Search (NAS) algorithm. While LLMs struggle to conduct NAS directly through prompts, \texttt{LLMatic} uses a procedural approach, leveraging QD for prompts and network architecture to create diverse and high-performing networks. We test \texttt{LLMatic} on the CIFAR-10 and NAS-bench-201 benchmarks, demonstrating that it can produce competitive networks while evaluating just $2,000$ candidates, even without prior knowledge of the benchmark domain or exposure to any previous top-performing models for the benchmark. The open-sourced code is available in \url{https://github.com/umair-nasir14/LLMatic}.

An Analysis of LLMatic: Neural Architecture Search via LLMs and Quality Diversity Optimization

The paper "LLMatic: Neural Architecture Search via LLMs and Quality Diversity Optimization" presents an algorithm, LLMatic, which integrates LLMs with Quality-Diversity (QD) optimization techniques to address the design challenges in Neural Architecture Search (NAS). This approach leverages the code generation capabilities of LLMs to introduce variations in neural network code and employs QD optimization to explore the search space effectively.

Overview of LLMatic

LLMatic is built upon the foundation that modern LLMs, which have been robustly trained on vast repositories of machine learning code, possess the capability to propose viable neural network architectures. However, these models require an external mechanism to evaluate and improve upon the generated architectures iteratively. By combining the strengths of LLMs and the robustness of QD methods, LLMatic achieves a systemic approach to NAS that caters to both high-performance and diverse architectural solutions.

The primary components of LLMatic include:

  1. LLM-Driven Variations: LLMatic uses LLMs fine-tuned on code to generate architectural variations. Given a prompt, the LLM is tasked with modifying an existing network, thereby introducing diversity.
  2. Quality-Diversity Optimization: Two archives are maintained, one for networks and another for prompts. This dual-archive approach leverages the QD principles to retain a spectrum of solutions, balancing quality (performance) and diversity (variety in architecture) simultaneously.

Experimental Evaluation

Experiments conducted on CIFAR-10 and NAS-Bench-201 benchmarks indicate that LLMatic is capable of producing competitive architectures with only 2000 candidate evaluations. Notably, LLMatic demonstrates this ability without requiring pre-existing knowledge of the benchmark domain or previously top-performing models.

For CIFAR-10, LLMatic was able to generate a wide range of architectures offering significant competitive performance, as evidenced by more than 20 strong networks identified during the experiments. On NAS-Bench-201, which offers a discretized and queryable search space, LLMatic achieved near-optimal results without exhaustive search, indicating its efficacy in constrained environments.

Contributions and Implications

LLMatic’s novel contribution lies in its dual integration of LLMs with QD search strategies, paving the way for a more informed and adaptable NAS methodology. It challenges traditional NAS paradigms by reducing the necessity for extensive trial-and-error methods or computationally expensive direct reinforcement learning approaches. This work implies that existing architectures for autonomous NAS could shift towards integrating pre-trained knowledge sources like LLMs for improving efficiency.

Theoretically, this approach not only demonstrates the capability of using LLMs beyond text processing but also opens discussions on multi-modal applications where symbolic reasoning (neural architecture descriptions) is coupled with optimization tasks. Practically, LLMatic presents a scalable approach for NAS applications, especially in edge computing scenarios where computational resources are limited.

Future Developments

The research paves a path for exploring even larger LLMs and more comprehensive QD frameworks, potentially expanding the applicability of LLMatic to various other domains such as natural language processing and robotics. Furthermore, by further tuning LLMs specifically for NAS-related coding tasks and enhancing prompt engineering techniques, LLMatic could improve its ability to discover even more optimized architectures.

Future research could further explore:

  • The integration of more sophisticated LLM-powered reasoning and problem-solving capabilities to enhance NAS exploration.
  • Application of this methodology on more complex datasets and architectures beyond typical benchmarks to assess scalability and flexibility.
  • Employing transfer learning and incremental updates within the LLMatic framework to reduce the search space further, increasing efficiency and potentially even outperforming state-of-the-art NAS methods.

Conclusion

LLMatic introduces an innovative way to tackle NAS challenges, marking a substantial stride in integrating LLMs with evolutionary design principles. By harnessing the innate knowledge of LLMs alongside robust QD optimization, LLMatic represents a pivotal development in the automated design process of neural network architectures, with favorable implications for both research and industry applications.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (50)
  1. A few thousand translations go a long way! leveraging pre-trained models for african news translation. arXiv preprint arXiv:2205.02022, 2022.
  2. Efficient hardware implementation of radial basis function neural network with customized-precision floating-point operations. Control Engineering Practice, 60:124–132, 2017.
  3. Jon Louis Bentley. Multidimensional binary search trees used for associative searching. Communications of the ACM, 18(9):509–517, 1975.
  4. Léon Bottou. Large-scale machine learning with stochastic gradient descent. In Proceedings of COMPSTAT’2010: 19th International Conference on Computational StatisticsParis France, August 22-27, 2010 Keynote, Invited and Contributed Papers, pp.  177–186. Springer, 2010.
  5. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
  6. Evoprompting: Language models for code-level neural architecture search. arXiv preprint arXiv:2302.14838, 2023.
  7. Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374, 2021.
  8. A downsampled variant of imagenet as an alternative to the cifar datasets. arXiv preprint arXiv:1707.08819, 2017.
  9. Quality and diversity optimization: A unifying modular framework. IEEE Transactions on Evolutionary Computation, 22(2):245–259, 2017.
  10. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pp.  248–255. Ieee, 2009.
  11. Euclidean distance matrices: essential theory, algorithms, and applications. IEEE Signal Processing Magazine, 32(6):12–30, 2015.
  12. Xuanyi Dong and Yi Yang. Nas-bench-201: Extending the scope of reproducible neural architecture search. arXiv preprint arXiv:2001.00326, 2020.
  13. Neural architecture search: A survey. The Journal of Machine Learning Research, 20(1):1997–2017, 2019.
  14. The pile: An 800gb dataset of diverse text for language modeling. arXiv preprint arXiv:2101.00027, 2020.
  15. Sam Greydanus. Scaling down deep learning. arXiv preprint arXiv:2011.14439, 2020.
  16. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp.  770–778, 2016.
  17. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International conference on machine learning, pp.  448–456. pmlr, 2015.
  18. Reinforcement learning for neural architecture search: A review. Image and Vision Computing, 89:57–66, 2019.
  19. Neural architecture search with bayesian optimisation and optimal transport. Advances in neural information processing systems, 31, 2018.
  20. Learning multiple layers of features from tiny images. 2009.
  21. Evolution through large models. arXiv preprint arXiv:2206.08896, 2022.
  22. A survey on evolutionary neural architecture search. IEEE transactions on neural networks and learning systems, 2021.
  23. Nas-bench-suite: Nas evaluation is (now) surprisingly easy. arXiv preprint arXiv:2201.13396, 2022.
  24. Designing neural networks using genetic algorithms. In ICGA, volume 89, pp.  379–384, 1989.
  25. Illuminating search spaces by mapping elites. arXiv preprint arXiv:1504.04909, 2015.
  26. ΔΔ\Deltaroman_Δ-darts: Mitigating performance collapse by harmonizing operation selection among cells. arXiv preprint arXiv:2210.07998, 2022.
  27. Rectified linear units improve restricted boltzmann machines. In Proceedings of the 27th international conference on machine learning (ICML-10), pp.  807–814, 2010.
  28. Practical pcg through large language models. arXiv preprint arXiv:2305.18243, 2023.
  29. Geographical distance is the new hyperparameter: A case study of finding the optimal pre-trained language for english-isizulu machine translation. arXiv preprint arXiv:2205.08621, 2022.
  30. Augmentative topology agents for open-ended learning. arXiv preprint arXiv:2210.11442, 2022.
  31. Codegen: An open large language model for code with multi-turn program synthesis. arXiv preprint arXiv:2203.13474, 2022.
  32. Policy gradient assisted map-elites. In Proceedings of the Genetic and Evolutionary Computation Conference, pp.  866–875, 2021.
  33. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32, 2019.
  34. Quality diversity: A new frontier for evolutionary computation. Frontiers in Robotics and AI, pp.  40, 2016.
  35. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019.
  36. Imagenet large scale visual recognition challenge. International journal of computer vision, 115:211–252, 2015.
  37. Evolving neural networks through augmenting topologies. Evolutionary computation, 10(2):99–127, 2002.
  38. Efficientnet: Rethinking model scaling for convolutional neural networks. In International conference on machine learning, pp.  6105–6114. PMLR, 2019.
  39. Mnasnet: Platform-aware neural architecture search for mobile. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.  2820–2828, 2019.
  40. Self organizing neural networks for the identification problem. Advances in Neural Information Processing Systems, 1, 1988.
  41. Level generation through large language models. In Proceedings of the 18th International Conference on the Foundations of Digital Games, pp.  1–8, 2023.
  42. Transfer learning. In Handbook of research on machine learning applications and trends: algorithms, methods, and techniques, pp.  242–264. IGI global, 2010.
  43. Analysis and comparison of a proposed mutation operator and its effects on the performance of genetic algorithm. Indonesian Journal of Electrical Engineering and Computer Science, 25(2):1208–12168, 2022.
  44. Using centroidal voronoi tessellations to scale up the multidimensional archive of phenotypic elites algorithm. IEEE Transactions on Evolutionary Computation, 22(4):623–630, 2017.
  45. Attention is all you need. Advances in neural information processing systems, 30, 2017.
  46. The clrs algorithmic reasoning benchmark. In International Conference on Machine Learning, pp.  22084–22102. PMLR, 2022.
  47. Neural architecture search: Insights from 1000 papers. arXiv preprint arXiv:2301.08727, 2023.
  48. Martin Wistuba. Finding competitive network architectures within a day using uct. arXiv preprint arXiv:1712.07420, 2017.
  49. Nas-bench-101: Towards reproducible neural architecture search. In International Conference on Machine Learning, pp.  7105–7114. PMLR, 2019.
  50. Can gpt-4 perform neural architecture search? arXiv preprint arXiv:2304.10970, 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Muhammad U. Nasir (3 papers)
  2. Sam Earle (25 papers)
  3. Julian Togelius (154 papers)
  4. Steven James (30 papers)
  5. Christopher Cleghorn (3 papers)
Citations (38)
Youtube Logo Streamline Icon: https://streamlinehq.com