Agents Thinking Fast and Slow: A Talker-Reasoner Architecture (2410.08328v1)
Abstract: LLMs have enabled agents of all kinds to interact with users through natural conversation. Consequently, agents now have two jobs: conversing and planning/reasoning. Their conversational responses must be informed by all available information, and their actions must help to achieve goals. This dichotomy between conversing with the user and doing multi-step reasoning and planning can be seen as analogous to the human systems of "thinking fast and slow" as introduced by Kahneman. Our approach is comprised of a "Talker" agent (System 1) that is fast and intuitive, and tasked with synthesizing the conversational response; and a "Reasoner" agent (System 2) that is slower, more deliberative, and more logical, and is tasked with multi-step reasoning and planning, calling tools, performing actions in the world, and thereby producing the new agent state. We describe the new Talker-Reasoner architecture and discuss its advantages, including modularity and decreased latency. We ground the discussion in the context of a sleep coaching agent, in order to demonstrate real-world relevance.
- Gpt-4 technical report. arXiv preprint arXiv:2303.08774 (2023).
- Do as i can, not as i say: Grounding language in robotic affordances. arXiv preprint arXiv:2204.01691 (2022).
- Constitutional ai: Harmlessness from ai feedback. arXiv preprint arXiv:2212.08073 (2022).
- Large language models can implement policy iteration. Advances in Neural Information Processing Systems 36 (2024).
- Tom B Brown. 2020. Language models are few-shot learners. arXiv preprint arXiv:2005.14165 (2020).
- Sparks of artificial general intelligence: Early experiments with gpt-4. arXiv preprint arXiv:2303.12712 (2023).
- Jay W Forrester. 1971. Counterintuitive behavior of social systems. Theory and decision 2, 2 (1971), 109–140.
- Uta Frith and Christopher D Frith. 2003. Development and neurophysiology of mentalizing. Philosophical Transactions of the Royal Society of London. Series B: Biological Sciences 358, 1431 (2003), 459–473.
- Learning actionable representations with goal-conditioned policies. arXiv preprint arXiv:1811.07819 (2018).
- David Ha and Jürgen Schmidhuber. 2018. World models. arXiv preprint arXiv:1803.10122 (2018).
- Reasoning with language model is planning with world model. arXiv preprint arXiv:2305.14992 (2023).
- Zhiting Hu and Tianmin Shu. 2023. Language models, agent models, and world models: The law for machine reasoning and planning. arXiv preprint arXiv:2312.05230 (2023).
- Arianna Huffington. 2016. The sleep revolution: Transforming your life, one night at a time. Harmony.
- Daniel Kahneman. 2011. Thinking, fast and slow. Farrar, Straus and Giroux (2011).
- Large language models are zero-shot reasoners. Advances in neural information processing systems 35 (2022), 22199–22213.
- Robotic systems architectures and programming. Springer handbook of robotics (2016), 283–306.
- Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems 33 (2020), 9459–9474.
- Theory of mind for multi-agent collaboration via large language models. arXiv preprint arXiv:2310.10701 (2023).
- Code as policies: Language model programs for embodied control. In 2023 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 9493–9500.
- Kevin P Murphy. 2000. A survey of POMDP solution techniques. environment 2, 10 (2000).
- Webgpt: Browser-assisted question-answering with human feedback. arXiv preprint arXiv:2112.09332 (2021).
- Generative agents: Interactive simulacra of human behavior. (2023), 1–22.
- David Premack and Guy Woodruff. 1978. Does the chimpanzee have a theory of mind? Behavioral and brain sciences 1, 4 (1978), 515–526.
- Llm-assist: Enhancing closed-loop planning with language-based reasoning. arXiv preprint arXiv:2401.00125 (2023).
- Hugginggpt: Solving ai tasks with chatgpt and its friends in hugging face. Advances in Neural Information Processing Systems 36 (2024).
- Reflexion: Language agents with verbal reinforcement learning. Advances in Neural Information Processing Systems 36 (2024).
- Progprompt: Generating situated robot task plans using large language models. In 2023 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 11523–11530.
- Interactive Planning Using Large Language Models for Partially Observable Robotic Tasks. In 2024 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 14054–14061.
- Reinforcement learning. Journal of Cognitive Neuroscience 11, 1 (1999), 126–134.
- Large language models as generalizable policies for embodied tasks. In The Twelfth International Conference on Learning Representations.
- Gemini: a family of highly capable multimodal models. arXiv preprint arXiv:2312.11805 (2023).
- Lamda: Language models for dialog applications. arXiv preprint arXiv:2201.08239 (2022).
- Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971 (2023).
- Voyager: An open-ended embodied agent with large language models. arXiv preprint arXiv:2305.16291 (2023).
- A survey on large language model based autonomous agents. Frontiers of Computer Science 18, 6 (2024), 186345.
- Describe, explain, plan and select: Interactive planning with large language models enables open-world multi-task agents. arXiv preprint arXiv:2302.01560 (2023).
- Chain-of-thought prompting elicits reasoning in large language models. Advances in neural information processing systems 35 (2022), 24824–24837.
- Auto-gpt for online decision making: Benchmarks and additional opinions. arXiv preprint arXiv:2306.02224 (2023).
- React: Synergizing reasoning and acting in language models. arXiv preprint arXiv:2210.03629 (2022).
- Empowering Large Language Model Agents through Action Learning. arXiv preprint arXiv:2402.15809 (2024).
- A survey of large language models. arXiv preprint arXiv:2303.18223 (2023).
- How FaR Are Large Language Models From Agents with Theory-of-Mind? arXiv preprint arXiv:2310.03051 (2023).
- Self-discover: Large language models self-compose reasoning structures. arXiv preprint arXiv:2402.03620 (2024).