Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
51 tokens/sec
GPT-4o
60 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

LLM as OS, Agents as Apps: Envisioning AIOS, Agents and the AIOS-Agent Ecosystem (2312.03815v2)

Published 6 Dec 2023 in cs.OS, cs.AI, cs.CL, and cs.LG
LLM as OS, Agents as Apps: Envisioning AIOS, Agents and the AIOS-Agent Ecosystem

Abstract: This paper envisions a revolutionary AIOS-Agent ecosystem, where LLM serves as the (Artificial) Intelligent Operating System (IOS, or AIOS)--an operating system "with soul". Upon this foundation, a diverse range of LLM-based AI Agent Applications (Agents, or AAPs) are developed, enriching the AIOS-Agent ecosystem and signaling a paradigm shift from the traditional OS-APP ecosystem. We envision that LLM's impact will not be limited to the AI application level, instead, it will in turn revolutionize the design and implementation of computer system, architecture, software, and programming language, featured by several main concepts: LLM as OS (system-level), Agents as Applications (application-level), Natural Language as Programming Interface (user-level), and Tools as Devices/Libraries (hardware/middleware-level). We begin by introducing the architecture of traditional OS. Then we formalize a conceptual framework for AIOS through "LLM as OS (LLMOS)", drawing analogies between AIOS and traditional OS: LLM is likened to OS kernel, context window to memory, external storage to file system, hardware tools to peripheral devices, software tools to programming libraries, and user prompts to user commands. Subsequently, we introduce the new AIOS-Agent Ecosystem, where users can easily program Agent Applications (AAPs) using natural language, democratizing the development of software, which is different from the traditional OS-APP ecosystem. Following this, we explore the diverse scope of Agent Applications. We delve into both single-agent and multi-agent systems, as well as human-agent interaction. Lastly, drawing on the insights from traditional OS-APP ecosystem, we propose a roadmap for the evolution of the AIOS-Agent ecosystem. This roadmap is designed to guide the future research and development, suggesting systematic progresses of AIOS and its Agent applications.

Envisioning the Future of AIOS: LLMs as Operating Systems

Introduction

In recent years, the development of LLMs has paved the way for significant advancements in artificial intelligence. These models have shown remarkable capabilities in understanding and generating human-like text, suggesting a potential for broader applications beyond traditional tasks. A groundbreaking paper from Rutgers University explores the concept of (Artificial) Intelligent Operating Systems (AIOS), where LLMs serve as the foundation of an intelligent operating system. This novel concept suggests a shift from the traditional OS-APP ecosystem to an AIOS-Agent ecosystem, fundamentally altering how users interact with computer systems.

LLM as OS (LLMOS)

The paper introduces the concept of "LLM as OS (LLMOS)," proposing a new framework that mirrors traditional Operating Systems' architecture but with LLMs at its core. This framework suggests several analogies between AIOS components and conventional OS elements, with the LLM likened to the OS kernel and other components such as context windows and external storage mirroring memory and file systems, respectively. Notably, LLMOS reimagines device management by integrating both hardware and software tools to extend the LLM's capabilities, enabling it to interact with the digital and physical world effectively.

Reasoning and Planning in LLMOS

LLMOS aims to endow LLMs with sophisticated reasoning and planning abilities, drawing inspiration from classical problems like the Dining Philosophers problem to illustrate the parallels in resource allocation and synchronization within a multi-agent ecosystem. The paper discusses several strategies for enhancing LLMs' planning capabilities, including single-path planning though Chain of Thoughts and multi-path planning with Tree of Thoughts, showcasing the potential for more advanced reasoning and creative problem-solving within LLMOS.

Tool Management

Tools in LLMOS serve as a bridge between LLMs and their operational environment, expanding their capabilities. The discussion on tool categories emphasizes the importance of software and hardware tools, with notable examples like ToolCoder and SayCan illustrating the practical applications of these tools. Furthermore, the concept of self-made tools indicates a future where LLMs can not only use tools but also generate and improve them, pushing the boundaries of autonomous problem-solving.

AIOS-Agent Ecosystem

The AIOS-Agent ecosystem envisions an environment where users and developers can effortlessly create Agent Applications (AAPs) using natural language. This ecosystem promises to democratize software development by making it accessible to a broader audience without specialized programming knowledge. The paper elucidates single and multi-agent applications, highlighting scenarios in the physical and digital worlds where agents can autonomously or collaboratively perform complex tasks, thus showcasing the practical implications of this ecosystem.

Future Research and Development

Considering the evolution of traditional operating systems, the paper outlines several future directions for AIOS, focusing on memory and tool management, communication, and security. Drawing parallels with historical OS advancements, the authors suggest innovative approaches for enhancing LLMOS's resource management capabilities, developing standardized communication protocols, and addressing security vulnerabilities. These insights present a strategic roadmap for systematic progress in AIOS and agent application research, with an emphasis on learning from the traditional OS-APP ecosystem's development trajectory.

Conclusion

The conceptual framework for AIOS presented in this paper signals a transformative change in the field of artificial intelligence and operating systems. By positioning LLMs as the foundational element of intelligent operating systems, the authors propose a shift towards a more interactive, intuitive, and accessible computing paradigm. This ambitious vision for AIOS not only challenges conventional computing paradigms but also opens up new avenues for innovation, collaboration, and interaction between humans and intelligent systems. The proposed roadmap for future research and development underscore the potential for AIOS to revolutionize our interaction with technology, promising a future where the boundaries between human intelligence and artificial intelligence become increasingly blurred.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (142)
  1. Leakage power analysis and reduction for nanoscale circuits. IEeE Micro 26, 2 (2006), 68–80.
  2. Do as i can, not as i say: Grounding language in robotic affordances. arXiv preprint arXiv:2204.01691 (2022).
  3. GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints. arXiv preprint arXiv:2305.13245 (2023).
  4. ETC: Encoding long and structured inputs in transformers. arXiv preprint arXiv:2004.08483 (2020).
  5. Longformer: The long-document transformer. arXiv preprint arXiv:2004.05150 (2020).
  6. Graph of thoughts: Solving elaborate problems with large language models. arXiv preprint arXiv:2308.09687 (2023).
  7. Emergent autonomous scientific research capabilities of large language models. arXiv preprint arXiv:2304.05332 (2023).
  8. Improving language models by retrieving from trillions of tokens. In International conference on machine learning. PMLR, 2206–2240.
  9. ChemCrow: Augmenting large-language models with chemistry tools. arXiv preprint arXiv:2304.05376 (2023).
  10. Plans and resource-bounded practical reasoning. Computational intelligence 4, 3 (1988), 349–355.
  11. Language models are few-shot learners. Advances in neural information processing systems 33 (2020), 1877–1901.
  12. Sparks of Artificial General Intelligence: Early experiments with GPT-4. arXiv:2303.12712 [cs.CL]
  13. ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate. arXiv:2308.07201 [cs.CL]
  14. Harrison Chase. 2022. LangChain. https://github.com/hwchase17/langchain
  15. Soundspaces: Audio-visual navigation in 3d environments. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part VI 16. Springer, 17–36.
  16. Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374 (2021).
  17. Extending context window of large language models via positional interpolation. arXiv preprint arXiv:2306.15595 (2023).
  18. Teaching large language models to self-debug. arXiv preprint arXiv:2304.05128 (2023).
  19. Multics: The First Seven Years. In Proceedings of the May 16-18, 1972, Spring Joint Computer Conference (Atlantic City, New Jersey) (AFIPS ’72 (Spring)). Association for Computing Machinery, New York, NY, USA, 571–583. https://doi.org/10.1145/1478873.1478950
  20. Linux Device Drivers, 3rd Edition. O’Reilly Media, Inc.
  21. Textworld: A learning environment for text-based games. In Computer Games: 7th Workshop, CGW 2018, Held in Conjunction with the 27th International Conference on Artificial Intelligence, IJCAI 2018, Stockholm, Sweden, July 13, 2018, Revised Selected Papers 7. Springer, 41–75.
  22. Search engines: Information retrieval in practice. Vol. 520. Addison-Wesley Reading.
  23. CXL. 2023. Compute Express Link Specification. https://www.computeexpresslink.org/.
  24. Mind2Web: Towards a Generalist Agent for the Web. arXiv preprint arXiv:2306.06070 (2023).
  25. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).
  26. Improving Factuality and Reasoning in Language Models through Multiagent Debate. arXiv:2305.14325 [cs.CL]
  27. Towards measuring the representation of subjective global opinions in language models. arXiv preprint arXiv:2306.16388 (2023).
  28. Susan S Fainstein and James DeFilippis. 2015. Readings in planning theory. John Wiley & Sons.
  29. Minedojo: Building open-ended embodied agents with internet-scale knowledge. Advances in Neural Information Processing Systems 35 (2022), 18343–18362.
  30. A Large Language Model Enhanced Conversational Recommender System. arXiv preprint arXiv:2308.06212 (2023).
  31. Giorgio Franceschelli and Mirco Musolesi. 2023. On the creativity of large language models. arXiv preprint arXiv:2304.00008 (2023).
  32. Drive Like a Human: Rethinking Autonomous Driving with Large Language Models. arXiv:2307.07162 [cs.RO]
  33. Improving language model negotiation with self-play and in-context learning from ai feedback. arXiv preprint arXiv:2305.10142 (2023).
  34. OpenAGI: When LLM Meets Domain Experts. In Thirty-seventh Conference on Neural Information Processing Systems.
  35. Recommendation as Language Processing (RLP): A Unified Pretrain, Personalized Prompt & Predict Paradigm (P5). In Proceedings of the 16th ACM Conference on Recommender Systems. 299–315.
  36. Significant Gravitas. 2023. AutoGPT. https://news.agpt.co/
  37. Gpu kernels for block-sparse weights. arXiv preprint arXiv:1711.09224 3, 2 (2017), 2.
  38. Retrieval augmented language model pre-training. In International conference on machine learning. PMLR, 3929–3938.
  39. Interactive fiction games: A colossal adventure. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 7903–7910.
  40. Metagpt: Meta programming for multi-agent collaborative framework. arXiv preprint arXiv:2308.00352 (2023).
  41. ChatDB: Augmenting LLMs with Databases as Their Symbolic Memory. arXiv preprint arXiv:2306.03901 (2023).
  42. War and Peace (WarAgent): Large Language Model-based Multi-Agent Simulation of World Wars. arXiv preprint arXiv:2311.17227 (2023).
  43. How to Index Item IDs for Recommendation Foundation Models. In Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region. 195–204.
  44. Jie Huang and Kevin Chen-Chuan Chang. 2022. Towards reasoning in large language models: A survey. arXiv preprint arXiv:2212.10403 (2022).
  45. Benchmarking Large Language Models As AI Research Agents. arXiv preprint arXiv:2310.03302 (2023).
  46. The POSIX standard. https://pubs.opengroup.org/onlinepubs/9699919799.2018edition/.
  47. Few-shot learning with retrieval augmented language models. arXiv preprint arXiv:2208.03299 (2022).
  48. Llmlingua: Compressing prompts for accelerated inference of large language models. arXiv preprint arXiv:2310.05736 (2023).
  49. Billion-scale similarity search with gpus. IEEE Transactions on Big Data 7, 3 (2019), 535–547.
  50. Dense passage retrieval for open-domain question answering. arXiv preprint arXiv:2004.04906 (2020).
  51. Transformers are rnns: Fast autoregressive transformers with linear attention. In International conference on machine learning. PMLR, 5156–5165.
  52. Language Models can Solve Computer Tasks. arXiv preprint arXiv:2303.17491 (2023).
  53. Reformer: The efficient transformer. arXiv preprint arXiv:2001.04451 (2020).
  54. Large language models are zero-shot reasoners. Advances in neural information processing systems 35 (2022), 22199–22213.
  55. Prompted LLMs as Chatbot Modules for Long Open-domain Conversation. arXiv preprint arXiv:2305.04533 (2023).
  56. Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems 33 (2020), 9459–9474.
  57. CAMEL: Communicative Agents for "Mind" Exploration of Large Scale Language Model Society. arXiv:2303.17760 [cs.AI]
  58. Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate. arXiv:2305.19118 [cs.CL]
  59. Taskmatrix. ai: Completing tasks by connecting foundation models with millions of apis. arXiv preprint arXiv:2303.16434 (2023).
  60. Jerry Liu. 2022. LlamaIndex. https://doi.org/10.5281/zenodo.1234
  61. Lost in the middle: How language models use long contexts. arXiv preprint arXiv:2307.03172 (2023).
  62. Reason for Future, Act for Now: A Principled Framework for Autonomous LLM Agents with Provable Sample Efficiency. arXiv preprint arXiv:2309.17382 (2023).
  63. BOLAA: Benchmarking and Orchestrating LLM-augmented Autonomous Agents. arXiv preprint arXiv:2308.05960 (2023).
  64. MemoChat: Tuning LLMs to Use Memos for Consistent Long-Range Open-Domain Conversation. arXiv:2308.08239 [cs.CL]
  65. Ingo Molnár. 2007. Linux CFS Scheduler. https://docs.kernel.org/scheduler/sched-design-CFS.html.
  66. Levels of AGI: Operationalizing Progress on the Path to AGI. arXiv preprint arXiv:2311.02462 (2023).
  67. WebGPT: Browser-assisted question-answering with human feedback. arXiv:2112.09332 [cs.CL]
  68. OpenAI. 2023. GPT-4V(ision) System Card. (2023).
  69. H. Orman. 2003. The Morris worm: a fifteen-year perspective. IEEE Security & Privacy 1, 5 (2003), 35–43. https://doi.org/10.1109/MSECP.2003.1236233
  70. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems 35 (2022), 27730–27744.
  71. MemGPT: Towards LLMs as Operating Systems. arXiv preprint arXiv:2310.08560 (2023).
  72. Unifying Large Language Models and Knowledge Graphs: A Roadmap. arXiv preprint arXiv:2306.08302 (2023).
  73. Talm: Tool augmented language models. arXiv preprint arXiv:2205.12255 (2022).
  74. Generative agents: Interactive simulacra of human behavior. In Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology. 1–22.
  75. Gorilla: Large language model connected with massive apis. arXiv preprint arXiv:2305.15334 (2023).
  76. Train short, test long: Attention with linear biases enables input length extrapolation. arXiv preprint arXiv:2108.12409 (2021).
  77. Visual Adversarial Examples Jailbreak Aligned Large Language Models. arXiv:2306.13213 [cs.CR]
  78. Fine-tuning Aligned Language Models Compromises Safety, Even When Users Do Not Intend To! arXiv preprint arXiv:2310.03693 (2023).
  79. Communicative agents for software development. arXiv preprint arXiv:2307.07924 (2023).
  80. CREATOR: Disentangling Abstract and Concrete Reasonings of Large Language Models through Tool Creation. arXiv preprint arXiv:2305.14318 (2023).
  81. Tool learning with foundation models. arXiv preprint arXiv:2304.08354 (2023).
  82. Toolllm: Facilitating large language models to master 16000+ real-world apis. arXiv preprint arXiv:2307.16789 (2023).
  83. Language models are unsupervised multitask learners. OpenAI blog 1, 8 (2019), 9.
  84. Dennis M. Ritchie and Ken Thompson. 1974. The UNIX Time-Sharing System. Commun. ACM 17, 7 (jul 1974), 365–375. https://doi.org/10.1145/361011.361061
  85. The probabilistic relevance framework: BM25 and beyond. Foundations and Trends® in Information Retrieval 3, 4 (2009), 333–389.
  86. Code Llama: Open Foundation Models for Code. arXiv:2308.12950 [cs.CL]
  87. Tptu: Task planning and tool usage of large language model-based ai agents. arXiv preprint arXiv:2308.03427 (2023).
  88. Identifying the Risks of LM Agents with an LM-Emulated Sandbox. arXiv:2309.15817 [cs.AI]
  89. Stuart Russell and Peter Norvig. 1995. Prentice Hall series in artificial intelligence. Prentice Hall Englewood Cliffs, NJ:.
  90. Personality traits in large language models. arXiv preprint arXiv:2307.00184 (2023).
  91. Gerard Salton. 1975. A vector space model for information retrieval. Journal of the ASIS (1975), 613–620.
  92. Toolformer: Language models can teach themselves to use tools. arXiv preprint arXiv:2302.04761 (2023).
  93. Personality Traits in Large Language Models. arXiv:2307.00184 [cs.CL]
  94. Noam Shazeer. 2019. Fast transformer decoding: One write-head is all you need. arXiv preprint arXiv:1911.02150 (2019).
  95. Large language models can be easily distracted by irrelevant context. In International Conference on Machine Learning. PMLR, 31210–31227.
  96. Reflexion: Language agents with verbal reinforcement learning. arXiv preprint arXiv:2303.11366 (2023).
  97. Alfworld: Aligning text and embodied environments for interactive learning. arXiv preprint arXiv:2010.03768 (2020).
  98. A DSL-Based Approach to Software Development and Deployment on Cloud. In 2010 24th IEEE International Conference on Advanced Information Networking and Applications. 414–421. https://doi.org/10.1109/AINA.2010.81
  99. I. Stoica and H. Abdel-Wahab. 1995. Earliest Eligible Virtual Deadline First: A Flexible and Accurate Mechanism for Proportional Share Resource Allocation. Technical Report. USA.
  100. Roformer: Enhanced transformer with rotary position embedding. arXiv preprint arXiv:2104.09864 (2021).
  101. Corex: Pushing the Boundaries of Complex Reasoning through Multi-Model Collaboration. arXiv:2310.00280 [cs.AI]
  102. A length-extrapolatable transformer. arXiv preprint arXiv:2212.10554 (2022).
  103. Anirudh S Sundar and Larry Heck. 2023. cTBL: Augmenting Large Language Models for Conversational Tables. arXiv preprint arXiv:2303.12024 (2023).
  104. ToolAlpaca: Generalized Tool Learning for Language Models with 3000 Simulated Cases. arXiv preprint arXiv:2306.05301 (2023).
  105. Stanford Alpaca: An Instruction-following LLaMA model. https://github.com/tatsu-lab/stanford_alpaca.
  106. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288 (2023).
  107. UW:CSE451. 2023. History of Operating Systems. https://courses.cs.washington.edu/courses/cse451/16wi/readings/lecture_readings/LCM_OperatingSystemsTimeline_Color_acd_newsize.pdf.
  108. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). In NeurIPS 2022 Foundation Models for Decision Making Workshop.
  109. Attention is all you need. Advances in neural information processing systems 30 (2017).
  110. Chatgpt for robotics: Design principles and model abilities. Microsoft Auton. Syst. Robot. Res 2 (2023), 20.
  111. Voyager: An open-ended embodied agent with large language models. arXiv preprint arXiv:2305.16291 (2023).
  112. RecAgent: A Novel Simulation Paradigm for Recommender Systems. arXiv preprint arXiv:2306.02552 (2023).
  113. Linformer: Self-attention with linear complexity. arXiv preprint arXiv:2006.04768 (2020).
  114. Self-Consistency Improves Chain of Thought Reasoning in Language Models. In The Eleventh International Conference on Learning Representations.
  115. RecMind: Large Language Model Powered Agent For Recommendation. arXiv preprint arXiv:2308.14296 (2023).
  116. Augmenting Black-box LLMs with Medical Textbooks for Clinical Question Answering. arXiv preprint arXiv:2309.02233 (2023).
  117. Describe, explain, plan and select: Interactive planning with large language models enables open-world multi-task agents. arXiv preprint arXiv:2302.01560 (2023).
  118. Humanoid Agents: Platform for Simulating Human-like Generative Agents. arXiv:2310.05418 [cs.CL]
  119. Jailbroken: How Does LLM Safety Training Fail? arXiv:2307.02483 [cs.LG]
  120. Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems 35 (2022), 24824–24837.
  121. Michael Wooldridge and Nicholas R Jennings. 1995. Intelligent agents: Theory and practice. The knowledge engineering review 10, 2 (1995), 115–152.
  122. AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation. arXiv:2308.08155 [cs.AI]
  123. Lemur: Harmonizing Natural Language and Code for Language Agents. arXiv preprint arXiv:2310.06830 (2023).
  124. Chenghao Yang and Allyson Ettinger. 2023. Can You Follow Me? Testing Situational Understanding in ChatGPT. arXiv preprint arXiv:2310.16135 (2023).
  125. A Comprehensive Overview of Backdoor Attacks in Large Language Models within Communication Networks. arXiv preprint arXiv:2308.14367 (2023).
  126. Shadow alignment: The ease of subverting safely-aligned language models. arXiv preprint arXiv:2310.02949 (2023).
  127. Mm-react: Prompting chatgpt for multimodal reasoning and action. arXiv preprint arXiv:2303.11381 (2023).
  128. Webshop: Towards scalable real-world web interaction with grounded language agents. Advances in Neural Information Processing Systems 35 (2022), 20744–20757.
  129. Shunyu Yao and Karthik Narasimhan. 2023. Language Agents in the Digital World: Opportunities and Risks. princeton-nlp.github.io (Jul 2023). https://princeton-nlp.github.io/language-agent-impact/
  130. Tree of thoughts: Deliberate problem solving with large language models. arXiv preprint arXiv:2305.10601 (2023).
  131. ReAct: Synergizing Reasoning and Acting in Language Models. In The Eleventh International Conference on Learning Representations.
  132. MusicAgent: An AI Agent for Music Understanding and Generation with Large Language Models. arXiv:2310.11954 [cs.CL]
  133. Wordcraft: story writing with large language models. In 27th International Conference on Intelligent User Interfaces. 841–852.
  134. Big bird: Transformers for longer sequences. Advances in neural information processing systems 33 (2020), 17283–17297.
  135. Zhuosheng Zhan and Aston Zhang. 2023. You Only Look at Screens: Multimodal Chain-of-Action Agents. arXiv preprint arXiv:2309.11436 (2023).
  136. ToolCoder: Teach Code Generation Models to use APIs with search tools. arXiv preprint arXiv:2305.04032 (2023).
  137. UniMC: A Unified Framework for Long-Term Memory Conversation via Relevance Representation Learning. arXiv preprint arXiv:2306.10543 (2023).
  138. MemoryBank: Enhancing Large Language Models with Long-Term Memory. arXiv preprint arXiv:2305.10250 (2023).
  139. Docprompting: Generating code by retrieving the docs. In The Eleventh International Conference on Learning Representations.
  140. Agents: An Open-source Framework for Autonomous Language Agents. arXiv:2309.07870 [cs.CL]
  141. Ghost in the Minecraft: Generally Capable Agents for Open-World Enviroments via Large Language Models with Text-based Knowledge and Memory. arXiv preprint arXiv:2305.17144 (2023).
  142. Universal and Transferable Adversarial Attacks on Aligned Language Models. arXiv:2307.15043 [cs.CL]
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Yingqiang Ge (36 papers)
  2. Yujie Ren (7 papers)
  3. Wenyue Hua (51 papers)
  4. Shuyuan Xu (31 papers)
  5. Juntao Tan (33 papers)
  6. Yongfeng Zhang (163 papers)
Citations (22)
Youtube Logo Streamline Icon: https://streamlinehq.com