Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
134 tokens/sec
GPT-4o
10 tokens/sec
Gemini 2.5 Pro Pro
47 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Asynchronous Tool Usage for Real-Time Agents (2410.21620v1)

Published 28 Oct 2024 in cs.AI

Abstract: While frontier LLMs are capable tool-using agents, current AI systems still operate in a strict turn-based fashion, oblivious to passage of time. This synchronous design forces user queries and tool-use to occur sequentially, preventing the systems from multitasking and reducing interactivity. To address this limitation, we introduce asynchronous AI agents capable of parallel processing and real-time tool-use. Our key contribution is an event-driven finite-state machine architecture for agent execution and prompting, integrated with automatic speech recognition and text-to-speech. Drawing inspiration from the concepts originally developed for real-time operating systems, this work presents both a conceptual framework and practical tools for creating AI agents capable of fluid, multitasking interactions.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (47)
  1. Voice activity detection based on multiple statistical models. IEEE Transactions on Signal Processing, 54(6):1965–1976, 2006.
  2. Vall-e 2: Neural codec language models are human parity zero-shot text to speech synthesizers. arXiv preprint arXiv:2406.05370, 2024.
  3. Edsger W Dijkstra. Cooperating sequential processes. In The origin of concurrent programming: from semaphores to remote procedure calls, pages 65–138. Springer, 2002.
  4. Multi-agent systems: A survey. Ieee Access, 6:28573–28593, 2018.
  5. Rto: An overview and assessment of current practice. Journal of Process control, 21(6):874–884, 2011.
  6. Yihan Dong. The multi-agent system based on llm for online discussions. In Proceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems, pages 2731–2733, 2024.
  7. Large language model based multi-agents: A survey of progress and challenges, 2024.
  8. Embodied llm agents learn to cooperate in organized teams, 2024.
  9. Overview of the ninth dialog system technology challenge: Dstc9. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2024.
  10. Per Brinch Hansen. Operating system principles. Prentice-Hall, Inc., 1973.
  11. David Harel. Statecharts: A visual formalism for complex systems. Science of computer programming, 8(3):231–274, 1987.
  12. Recurrent neural networks for voice activity detection. In 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pages 7378–7382. IEEE, 2013.
  13. Charles Antony Richard Hoare. Communicating sequential processes. Communications of the ACM, 21(8):666–677, 1978.
  14. The survey of real time operating system: Rtos. In 2014 International Conference on Electronic Systems, Signal Processing and Computing Technologies, pages 34–39, 2014.
  15. Dietmar Jannach. Evaluating conversational recommender systems: A landscape of research. Artificial Intelligence Review, 56(3):2365–2400, 2023.
  16. Nicholas R Jennings. Controlling cooperative problem solving in industrial multi-agent systems using joint intentions. Artificial intelligence, 75(2):195–240, 1995.
  17. Anis Koubaa et al. Robot Operating System (ROS)., volume 1. Springer, 2017.
  18. Sarit Kraus. Negotiation and cooperation in multi-agent environments. Artificial intelligence, 94(1-2):79–97, 1997.
  19. Apigen: Automated pipeline for generating verifiable and diverse function-calling datasets, 2024.
  20. Scheduling algorithms for multiprogramming in a hard-real-time environment. J. ACM, 20(1):46–61, January 1973.
  21. Experience with processes and monitors in mesa. Communications of the ACM, 23(2):105–117, 1980.
  22. Voicebox: Text-guided multilingual universal speech generation at scale. Advances in neural information processing systems, 36, 2024.
  23. Agentlite: A lightweight library for building and advancing task-oriented llm agent system, 2024.
  24. Dynamic llm-agent network: An llm-agent collaboration framework with agent team optimization, 2023.
  25. S Baskiyar N Meghanathan. A survey of contemporary real-time operating systems. Informatica, 29(2), 2005.
  26. A simple but efficient real-time voice activity detection algorithm. In 2009 17th European signal processing conference, pages 2549–2553. IEEE, 2009.
  27. Rupert G Miller Jr. Priority queues. The Annals of Mathematical Statistics, 31(1):86–103, 1960.
  28. Training language models to follow instructions with human feedback, 2022.
  29. Instruction tuning with gpt-4, 2023.
  30. Basic concepts of real time operating systems. Hardware-dependent Software: Principles and Practice, pages 15–45, 2009.
  31. Robust speech recognition via large-scale weak supervision. In International conference on machine learning, pages 28492–28518. PMLR, 2023.
  32. P Rogers. ” real-time systems and programming languages” by alan burns and andy wellings. ADA USER JOURNAL, 22(2):126–126, 2001.
  33. A statistical model-based voice activity detection. IEEE signal processing letters, 6(1):1–3, 1999.
  34. The spring kernel: A new paradigm for real-time operating systems. ACM SIGOPS Operating Systems Review, 23(3):54–71, 1989.
  35. Real-time operating systems. Real-Time Systems, 28(2-3):237–253, 2004.
  36. Naturalspeech: End-to-end text-to-speech synthesis with human-level quality. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024.
  37. Stanford AI Lab Team. On the opportunities and risks of foundation models, 2022.
  38. Meta Gen AI Team. The llama 3 herd of models, 2024.
  39. Multi-agent collaboration: Harnessing the power of intelligent llm agents, 2023.
  40. Cooperative multi-agent planning: A survey. ACM Computing Surveys (CSUR), 50(6):1–32, 2017.
  41. Wolfgang Wahlster. Understanding computational dialogue understanding. Philosophical Transactions of the Royal Society A, 381(2251):20220049, 2023.
  42. Autogen: Enabling next-gen llm applications via multi-agent conversation framework. arXiv preprint arXiv:2308.08155, 2023.
  43. The surprising effectiveness of ppo in cooperative multi-agent games. Advances in Neural Information Processing Systems, 35:24611–24624, 2022.
  44. xlam: A family of large action models to empower ai agent systems, 2024.
  45. Xiao-Lei Zhang and Ji Wu. Deep belief networks based voice activity detection. IEEE Transactions on Audio, Speech, and Language Processing, 21(4):697–710, 2012.
  46. Towards efficient llm grounding for embodied multi-agent collaboration, 2024.
  47. Diversity empowers intelligence: Integrating expertise of software engineering agents, 2024.

Summary

We haven't generated a summary for this paper yet.