LLMind: Orchestrating AI and IoT with LLM for Complex Task Execution (2312.09007v4)
Abstract: Task-oriented communications are an important element in future intelligent IoT systems. Existing IoT systems, however, are limited in their capacity to handle complex tasks, particularly in their interactions with humans to accomplish these tasks. In this paper, we present LLMind, an LLM-based task-oriented AI agent framework that enables effective collaboration among IoT devices, with humans communicating high-level verbal instructions, to perform complex tasks. Inspired by the functional specialization theory of the brain, our framework integrates an LLM with domain-specific AI modules, enhancing its capabilities. Complex tasks, which may involve collaborations of multiple domain-specific AI modules and IoT devices, are executed through a control script generated by the LLM using a Language-Code transformation approach, which first converts language descriptions to an intermediate finite-state machine (FSM) before final precise transformation to code. Furthermore, the framework incorporates a novel experience accumulation mechanism to enhance response speed and effectiveness, allowing the framework to evolve and become progressively sophisticated through continuing user and machine interactions.
- W. Huang, F. Xia, T. Xiao, H. Chan, J. Liang, P. Florence, A. Zeng, J. Tompson, I. Mordatch, Y. Chebotar et al., “Inner monologue: Embodied reasoning through planning with language models,” arXiv preprint arXiv:2207.05608, 2022.
- J. Mao, Y. Qian, H. Zhao, and Y. Wang, “GPT-driver: Learning to drive with GPT,” arXiv preprint arXiv:2310.01415, 2023.
- A. Kaplan and M. Haenlein, “Siri, Siri, in my hand: Who’s the fairest in the land? on the interpretations, illustrations, and implications of artificial intelligence,” Business horizons, vol. 62, no. 1, pp. 15–25, 2019.
- M. Asay, “Are large language models wrong for coding?” InfoWorld, 05 2023. [Online]. Available: https://www.infoworld.com/article/3697272/are-large-language-models-wrong-for-coding.html
- P. Gladyshev, “Finite state machine approach to digital event reconstruction,” Digital Investigation, vol. 1, 05 2004.
- G. Jocher, “YOLOv8 documentation,” docs.ultralytics.com, 05 2020. [Online]. Available: https://docs.ultralytics.com/
- ageitgey, “ageitgey/face_recognition,” GitHub, 06 2019. [Online]. Available: https://github.com/ageitgey/face_recognition/
- Y. Xia, M. Shenoy, N. Jazdi, and M. Weyrich, “Towards autonomous system: flexible modular production system enhanced with large language model agents,” arXiv preprint arXiv:2304.14721, 2023.
- W. Huang, C. Wang, R. Zhang, Y. Li, J. Wu, and L. Fei-Fei, “Voxposer: Composable 3D value maps for robotic manipulation with language models,” arXiv preprint arXiv:2307.05973, 2023.
- Y. Shen, K. Song, X. Tan, D. Li, W. Lu, and Y. Zhuang, “HuggingGPT: Solving ai tasks with ChatGPT and its friends in HuggingFace,” arXiv preprint arXiv:2303.17580, 2023.
- C. H. Song, J. Wu, C. Washington, B. M. Sadler, W.-L. Chao, and Y. Su, “LLM-planner: Few-shot grounded planning for embodied agents with large language models,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 2998–3009.
- L. Wang, C. Ma, X. Feng, Z. Zhang, H. Yang, J. Zhang, Z. Chen, J. Tang, X. Chen, Y. Lin et al., “A survey on large language model based autonomous agents,” arXiv preprint arXiv:2308.11432, 2023.
- S.-G. , “Github - Significant-Gravitas/AutoGPT: An experimental open-source attempt to make GPT-4 fully autonomous.” https://github.com/Significant-Gravitas/AutoGPT.
- “Auto-GPT unmasked: The hype and hard truths of its production pitfalls,” jina.ai. [Online]. Available: https://jina.ai/news/auto-gpt-unmasked-hype-hard-truths-production-pitfalls/
- Hongwei Cui (2 papers)
- Yuyang Du (14 papers)
- Qun Yang (22 papers)
- Yulin Shao (56 papers)
- Soung Chang Liew (98 papers)