Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
98 tokens/sec
GPT-4o
61 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

L2MAC: Large Language Model Automatic Computer for Extensive Code Generation (2310.02003v5)

Published 2 Oct 2023 in cs.SE, cs.AI, cs.LG, and cs.PL

Abstract: Transformer-based LLMs are constrained by the fixed context window of the underlying transformer architecture, hindering their ability to produce long and coherent outputs. Memory-augmented LLMs are a promising solution, but current approaches cannot handle long output generation tasks since they (1) only focus on reading memory and reduce its evolution to the concatenation of new memories or (2) use very specialized memories that cannot adapt to other domains. This paper presents L2MAC, the first practical LLM-based general-purpose stored-program automatic computer (von Neumann architecture) framework, an LLM-based multi-agent system, for long and consistent output generation. Its memory has two components: the instruction registry, which is populated with a prompt program to solve the user-given task, and a file store, which will contain the final and intermediate outputs. Each instruction in turn is executed by a separate LLM agent, whose context is managed by a control unit capable of precise memory reading and writing to ensure effective interaction with the file store. These components enable L2MAC to generate extensive outputs, bypassing the constraints of the finite context window while producing outputs that fulfill a complex user-specified task. We empirically demonstrate that L2MAC achieves state-of-the-art performance in generating large codebases for system design tasks, significantly outperforming other coding methods in implementing the detailed user-specified task; we show that L2MAC works for general-purpose extensive text-based tasks, such as writing an entire book; and we provide valuable insights into L2MAC's performance improvement over existing methods.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (68)
  1. Program synthesis with large language models. arXiv preprint arXiv:2108.07732, 2021.
  2. Ned Batchelder and Contributors to Coverage.py. Coverage.py: The code coverage tool for Python. https://github.com/nedbat/coveragepy, 2023.
  3. Mathematical proof between generations, 2022.
  4. Longformer: The long-document transformer, 2020.
  5. Improving language models by retrieving from trillions of tokens, 2022.
  6. Maarten Breddels. Solara. https://github.com/widgetti/solara, 2023.
  7. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
  8. Recurrent memory transformer. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh (eds.), Advances in Neural Information Processing Systems, volume 35, pp.  11079–11091. Curran Associates, Inc., 2022. URL https://proceedings.neurips.cc/paper_files/paper/2022/file/47e288629a6996a17ce50b90a056a0e1-Paper-Conference.pdf.
  9. Codet: Code generation with generated tests. arXiv preprint arXiv:2207.10397, 2022.
  10. Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374, 2021.
  11. Decouple knowledge from paramters for plug-and-play language modeling. In Findings of the Association for Computational Linguistics: ACL 2023, pp.  14288–14308, Toronto, Canada, July 2023. Association for Computational Linguistics. doi: 10.18653/v1/2023.findings-acl.901. URL https://aclanthology.org/2023.findings-acl.901.
  12. Can large language models be an alternative to human evaluations? arXiv preprint arXiv:2305.01937, 2023.
  13. Generating long sequences with sparse transformers, 2019.
  14. Rethinking attention with performers, 2022.
  15. Language model cascades. arXiv preprint arXiv:2207.10342, 2022.
  16. Incremental language models for speech recognition using finite-state transducers. In IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU’01., pp.  194–197. IEEE, 2001.
  17. A survey for in-context learning. arXiv preprint arXiv:2301.00234, 2022.
  18. Looped transformers as programmable computers, 2023.
  19. Neural turing machines, 2014.
  20. LongT5: Efficient text-to-text transformer for long sequences. In Findings of the Association for Computational Linguistics: NAACL 2022, pp.  724–736, Seattle, United States, July 2022. Association for Computational Linguistics. doi: 10.18653/v1/2022.findings-naacl.55. URL https://aclanthology.org/2022.findings-naacl.55.
  21. Digital design and computer architecture. Morgan Kaufmann, 2010.
  22. Tony Hasemer. Syntactic debugging of procedural programs. In Novice Programming Environments, pp.  227–241. Routledge, 2018.
  23. Computer architecture: a quantitative approach. Elsevier, 2011.
  24. Chatdb: Augmenting llms with databases as their symbolic memory. arXiv preprint arXiv:2306.03901, 2023.
  25. Peter Justin. Flaskbb. https://github.com/flaskbb/flaskbb, 2023.
  26. Concepts and guidelines of feature modeling for product line software engineering. In International Conference on Software Reuse, pp.  62–77. Springer, 2002.
  27. Isarstep: a benchmark for high-level mathematical reasoning, 2021.
  28. Unleashing infinite-length input capacity for large-scale language models with self-controlled memory system. arXiv preprint arXiv:2304.13343, 2023.
  29. Lost in the middle: How language models use long contexts. arXiv preprint arXiv:2307.03172, 2023.
  30. Fully autonomous programming with large language models. arXiv preprint arXiv:2304.10423, 2023.
  31. Self-refine: Iterative refinement with self-feedback. arXiv preprint arXiv:2303.17651, 2023.
  32. Donne Martin. System design primer. https://github.com/donnemartin/system-design-primer, 2023.
  33. Systematic mistake analysis of digital computer programs. Communications of the ACM, 6(2):58–63, 1963.
  34. Ret-llm: Towards a general read-write memory for large language models, 2023.
  35. Yohei Nakajima. Babyagi. https://github.com/yoheinakajima/babyagi, 2023.
  36. Michael Nicolaidis. Soft errors in modern electronic systems, volume 41. Springer Science & Business Media, 2010.
  37. Brian Okken. Python Testing with pytest. Pragmatic Bookshelf, 2022.
  38. OpenAI. Gpt-4 technical report, 2023.
  39. Anton Osika. Gpt-engineer. https://github.com/AntonOsika/gpt-engineer, 2023.
  40. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35:27730–27744, 2022.
  41. Random feature attention, 2021.
  42. Investigating efficiently extending transformers for long input summarization, 2022.
  43. Train short, test long: Attention with linear biases enables input length extrapolation. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=R8sQPpGCv0.
  44. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019.
  45. Toran Bruce Richards. Autogpt. https://github.com/Significant-Gravitas/Auto-GPT, 2023.
  46. Code llama: Open foundation models for code. arXiv preprint arXiv:2308.12950, 2023.
  47. Evaluating language-model agents on realistic autonomous tasks. https://evals.alignment.org/Evaluating_LMAs_Realistic_Tasks.pdf, 2023.
  48. Toolformer: Language models can teach themselves to use tools, 2023.
  49. Dale Schuurmans. Memory augmented large language models are computationally universal. arXiv preprint arXiv:2301.04589, 2023.
  50. Reflexion: Language agents with verbal reinforcement learning. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
  51. Agentgpt. https://github.com/reworkd/AgentGPT, 2023.
  52. Chris Tabor. Flask jsondash. https://github.com/christabor/flask_jsondash, 2023.
  53. Stability properties of constrained queueing systems and scheduling policies for maximum throughput in multihop radio networks. In 29th IEEE Conference on Decision and Control, pp.  2130–2132. IEEE, 1990.
  54. Efficient transformers: A survey, 2022.
  55. Sylvain Thénault. Pylint, 2023. URL https://pylint.readthedocs.io/en/stable/. Available at: http://pylint.pycqa.org/.
  56. Alan Mathison Turing et al. On computable numbers, with an application to the entscheidungsproblem. J. of Math, 58(345-363):5, 1936.
  57. Attention is all you need. Advances in neural information processing systems, 30, 2017.
  58. John Von Neumann. First draft of a report on the edvac, 30 june 1945. Moore School of Electrical Engineering, University of Pennsylvania, Philadelphia, PA, USA. Available online: https://library. si. edu/digital-library/book/firstdraftofrepo00vonn (accessed on 1 October 2022), 1945.
  59. A survey on large language model based autonomous agents. arXiv preprint arXiv:2308.11432, 2023.
  60. Linformer: Self-attention with linear complexity, 2020.
  61. Recursively summarizing books with human feedback, 2021.
  62. Memorizing transformers, 2022.
  63. System Design Interview: An Insider’s Guide, volume 1. Byte Code LLC, 2020.
  64. React: Synergizing reasoning and acting in language models. arXiv preprint arXiv:2210.03629, 2022.
  65. Adaptive semiparametric language models, 2021.
  66. Big bird: Transformers for longer sequences, 2021.
  67. Summn𝑛{}^{n}start_FLOATSUPERSCRIPT italic_n end_FLOATSUPERSCRIPT: A multi-stage summarization framework for long input dialogues and documents. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp.  1592–1604, Dublin, Ireland, May 2022. Association for Computational Linguistics. doi: 10.18653/v1/2022.acl-long.112. URL https://aclanthology.org/2022.acl-long.112.
  68. Training language models with memory augmentation. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pp.  5657–5673, 2022.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Samuel Holt (18 papers)
  2. Max Ruiz Luyten (6 papers)
  3. Mihaela van der Schaar (321 papers)
Citations (9)
Youtube Logo Streamline Icon: https://streamlinehq.com