Executing Natural Language-Described Algorithms with Large Language Models: An Investigation (2403.00795v2)
Abstract: Executing computer programs described in natural language has long been a pursuit of computer science. With the advent of enhanced natural language understanding capabilities exhibited by LLMs, the path toward this goal has been illuminated. In this paper, we seek to examine the capacity of present-day LLMs to comprehend and execute algorithms outlined in natural language. We established an algorithm test set sourced from Introduction to Algorithm, a well-known textbook that contains many representative widely-used algorithms. To systematically assess LLMs' code execution abilities, we selected 30 algorithms, generated 300 random-sampled instances in total, and evaluated whether popular LLMs can understand and execute these algorithms. Our findings reveal that LLMs, notably GPT-4, can effectively execute programs described in natural language, as long as no heavy numeric computation is involved. We believe our findings contribute to evaluating LLMs' code execution abilities and would encourage further investigation and application for the computation power of LLMs.
- Alfred V. Aho and John E. Hopcroft. 1974. The Design and Analysis of Computer Algorithms, 1st edition. Addison-Wesley Longman Publishing Co., Inc., USA.
- Palm 2 technical report.
- Code generation tools (almost) for free? A study of few-shot, pre-trained language models on code. CoRR, abs/2206.01335.
- Richard Bellman. 1958. On a routing problem. Quarterly of applied mathematics, 16(1):87–90.
- Jon Bentley. 1984. Programming pearls: Algorithm design techniques. Commun. ACM, 27(9):865–873.
- On the computational power of transformers and its implications in sequence modeling. In Proceedings of the 24th Conference on Computational Natural Language Learning, pages 455–475, Online. Association for Computational Linguistics.
- Corrado Böhm and Giuseppe Jacopini. 1966. Flow diagrams, turing machines and languages with only two formation rules. Communications of the ACM, 9(5):366–371.
- Language models are few-shot learners. In Advances in Neural Information Processing Systems, volume 33, pages 1877–1901. Curran Associates, Inc.
- Sparks of artificial general intelligence: Early experiments with GPT-4. CoRR, abs/2303.12712.
- Evaluating large language models trained on code. CoRR, abs/2107.03374.
- Livio Colussi. 1994. Fastest pattern matching in strings. Journal of Algorithms, 16(2):163–189.
- Introduction to algorithms.
- E. W. Dijkstra. 2022. A Note on Two Problems in Connexion with Graphs, 1 edition, page 287–290. Association for Computing Machinery, New York, NY, USA.
- Robert W. Floyd. 1962. Algorithm 97: Shortest path. Commun. ACM, 5(6):345.
- Fănică Gavril. 1972. Algorithms for minimum coloring, maximum clique, minimum covering by cliques, and maximum independent set of a chordal graph. SIAM Journal on Computing, 1(2):180–187.
- R.L. Graham. 1972. An efficient algorith for determining the convex hull of a finite planar set. Information Processing Letters, 1(4):132–133.
- Neural turing machines.
- Pre-trained models: Past, present and future. AI Open, 2:225–250.
- C. A. R. Hoare. 1961a. Algorithm 64: Quicksort. Commun. ACM, 4(7):321.
- C. A. R. Hoare. 1961b. Algorithm 65: Find. Commun. ACM, 4(7):321–322.
- An empirical analysis of compute-optimal large language model training. In Advances in Neural Information Processing Systems.
- R.A. Jarvis. 1973. On the identification of the convex hull of a finite set of points in the plane. Information Processing Letters, 2(1):18–21.
- GPT is becoming a turing machine: Here are some ways to program it. CoRR, abs/2303.14310.
- Repair is nearly generation: Multilingual program repair with llms. Proceedings of the AAAI Conference on Artificial Intelligence, 37(4):5131–5140.
- Scaling laws for neural language models. CoRR, abs/2001.08361.
- Donald E Knuth. 1973. Fundamental algorithms.
- Joseph B. Kruskal. 1956. On the shortest spanning subtree of a graph and the traveling salesman problem. Proceedings of the American Mathematical Society, 7(1):48–50.
- Eugene L Lawler. 1985. The traveling salesman problem: a guided tour of combinatorial optimization. Wiley-Interscience Series in Discrete Mathematics.
- Explaining competitive-level programming solutions using llms. CoRR, abs/2307.05337.
- Competition-level code generation with alphacode. Science, 378(6624):1092–1097.
- Code execution with pre-trained language models. arXiv preprint arXiv:2305.05383.
- Edward F Moore. 1959. The shortest path through a maze. In Proc. Int. Symp. Switching Theory, 1959, pages 285–292.
- Using an llm to help with code understanding.
- OpenAI. 2023. GPT-4 technical report. CoRR, abs/2303.08774.
- Training language models to follow instructions with human feedback. In NeurIPS.
- R. C. Prim. 1957. Shortest connection networks and some generalizations. The Bell System Technical Journal, 36(6):1389–1401.
- On the turing completeness of modern neural network architectures. In International Conference on Learning Representations.
- Jean E. Sammet. 1966. The use of english as a programming language. Commun. ACM, 9(3):228–230.
- Multitask prompted training enables zero-shot task generalization. In International Conference on Learning Representations.
- BLOOM: A 176b-parameter open-access multilingual language model. CoRR, abs/2211.05100.
- Dale Schuurmans. 2023. Memory augmented large language models are computationally universal.
- An empirical evaluation of using large language models for automated unit test generation. IEEE Transactions on Software Engineering, 50(1):85–105.
- H.T. Siegelmann and E.D. Sontag. 1995. On the computational power of neural nets. Journal of Computer and System Sciences, 50(1):132–150.
- Beyond the imitation game: Quantifying and extrapolating the capabilities of language models. CoRR, abs/2206.04615.
- Llama: Open and efficient foundation language models. CoRR, abs/2302.13971.
- AM Turing. 1937. On computable numbers, with an application to the entscheidungsproblem. Proceedings of the London Mathematical Society, 2(1):230–230.
- The CLRS algorithmic reasoning benchmark. In Proceedings of the 39th International Conference on Machine Learning, volume 162 of Proceedings of Machine Learning Research, pages 22084–22102. PMLR.
- Super-NaturalInstructions: Generalization via declarative instructions on 1600+ NLP tasks. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 5085–5109, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
- Statistically Meaningful Approximation: a Case Study on Approximating Turing Machines with Transformers. In Advances in Neural Information Processing Systems, volume 35, pages 12071–12083. Curran Associates, Inc.
- Finetuned language models are zero-shot learners. In International Conference on Learning Representations.
- Emergent abilities of large language models. Transactions on Machine Learning Research. Survey Certification.
- Chain of thought prompting elicits reasoning in large language models. In Advances in Neural Information Processing Systems.
- On the practical computational power of finite precision RNNs for language recognition. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 740–745, Melbourne, Australia. Association for Computational Linguistics.
- John William Joseph Williams. 1964. Algorithm 232: heapsort. Communications of the ACM, 7(6):347–348.