Asynchronous Tool Usage for Real-Time Agents (2410.21620v1)
Abstract: While frontier LLMs are capable tool-using agents, current AI systems still operate in a strict turn-based fashion, oblivious to passage of time. This synchronous design forces user queries and tool-use to occur sequentially, preventing the systems from multitasking and reducing interactivity. To address this limitation, we introduce asynchronous AI agents capable of parallel processing and real-time tool-use. Our key contribution is an event-driven finite-state machine architecture for agent execution and prompting, integrated with automatic speech recognition and text-to-speech. Drawing inspiration from the concepts originally developed for real-time operating systems, this work presents both a conceptual framework and practical tools for creating AI agents capable of fluid, multitasking interactions.
- Voice activity detection based on multiple statistical models. IEEE Transactions on Signal Processing, 54(6):1965–1976, 2006.
- Vall-e 2: Neural codec language models are human parity zero-shot text to speech synthesizers. arXiv preprint arXiv:2406.05370, 2024.
- Edsger W Dijkstra. Cooperating sequential processes. In The origin of concurrent programming: from semaphores to remote procedure calls, pages 65–138. Springer, 2002.
- Multi-agent systems: A survey. Ieee Access, 6:28573–28593, 2018.
- Rto: An overview and assessment of current practice. Journal of Process control, 21(6):874–884, 2011.
- Yihan Dong. The multi-agent system based on llm for online discussions. In Proceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems, pages 2731–2733, 2024.
- Large language model based multi-agents: A survey of progress and challenges, 2024.
- Embodied llm agents learn to cooperate in organized teams, 2024.
- Overview of the ninth dialog system technology challenge: Dstc9. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2024.
- Per Brinch Hansen. Operating system principles. Prentice-Hall, Inc., 1973.
- David Harel. Statecharts: A visual formalism for complex systems. Science of computer programming, 8(3):231–274, 1987.
- Recurrent neural networks for voice activity detection. In 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pages 7378–7382. IEEE, 2013.
- Charles Antony Richard Hoare. Communicating sequential processes. Communications of the ACM, 21(8):666–677, 1978.
- The survey of real time operating system: Rtos. In 2014 International Conference on Electronic Systems, Signal Processing and Computing Technologies, pages 34–39, 2014.
- Dietmar Jannach. Evaluating conversational recommender systems: A landscape of research. Artificial Intelligence Review, 56(3):2365–2400, 2023.
- Nicholas R Jennings. Controlling cooperative problem solving in industrial multi-agent systems using joint intentions. Artificial intelligence, 75(2):195–240, 1995.
- Anis Koubaa et al. Robot Operating System (ROS)., volume 1. Springer, 2017.
- Sarit Kraus. Negotiation and cooperation in multi-agent environments. Artificial intelligence, 94(1-2):79–97, 1997.
- Apigen: Automated pipeline for generating verifiable and diverse function-calling datasets, 2024.
- Scheduling algorithms for multiprogramming in a hard-real-time environment. J. ACM, 20(1):46–61, January 1973.
- Experience with processes and monitors in mesa. Communications of the ACM, 23(2):105–117, 1980.
- Voicebox: Text-guided multilingual universal speech generation at scale. Advances in neural information processing systems, 36, 2024.
- Agentlite: A lightweight library for building and advancing task-oriented llm agent system, 2024.
- Dynamic llm-agent network: An llm-agent collaboration framework with agent team optimization, 2023.
- S Baskiyar N Meghanathan. A survey of contemporary real-time operating systems. Informatica, 29(2), 2005.
- A simple but efficient real-time voice activity detection algorithm. In 2009 17th European signal processing conference, pages 2549–2553. IEEE, 2009.
- Rupert G Miller Jr. Priority queues. The Annals of Mathematical Statistics, 31(1):86–103, 1960.
- Training language models to follow instructions with human feedback, 2022.
- Instruction tuning with gpt-4, 2023.
- Basic concepts of real time operating systems. Hardware-dependent Software: Principles and Practice, pages 15–45, 2009.
- Robust speech recognition via large-scale weak supervision. In International conference on machine learning, pages 28492–28518. PMLR, 2023.
- P Rogers. ” real-time systems and programming languages” by alan burns and andy wellings. ADA USER JOURNAL, 22(2):126–126, 2001.
- A statistical model-based voice activity detection. IEEE signal processing letters, 6(1):1–3, 1999.
- The spring kernel: A new paradigm for real-time operating systems. ACM SIGOPS Operating Systems Review, 23(3):54–71, 1989.
- Real-time operating systems. Real-Time Systems, 28(2-3):237–253, 2004.
- Naturalspeech: End-to-end text-to-speech synthesis with human-level quality. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024.
- Stanford AI Lab Team. On the opportunities and risks of foundation models, 2022.
- Meta Gen AI Team. The llama 3 herd of models, 2024.
- Multi-agent collaboration: Harnessing the power of intelligent llm agents, 2023.
- Cooperative multi-agent planning: A survey. ACM Computing Surveys (CSUR), 50(6):1–32, 2017.
- Wolfgang Wahlster. Understanding computational dialogue understanding. Philosophical Transactions of the Royal Society A, 381(2251):20220049, 2023.
- Autogen: Enabling next-gen llm applications via multi-agent conversation framework. arXiv preprint arXiv:2308.08155, 2023.
- The surprising effectiveness of ppo in cooperative multi-agent games. Advances in Neural Information Processing Systems, 35:24611–24624, 2022.
- xlam: A family of large action models to empower ai agent systems, 2024.
- Xiao-Lei Zhang and Ji Wu. Deep belief networks based voice activity detection. IEEE Transactions on Audio, Speech, and Language Processing, 21(4):697–710, 2012.
- Towards efficient llm grounding for embodied multi-agent collaboration, 2024.
- Diversity empowers intelligence: Integrating expertise of software engineering agents, 2024.