Emergent Mind

How FaR Are Large Language Models From Agents with Theory-of-Mind?

(2310.03051)
Published Oct 4, 2023 in cs.CL and cs.AI

Abstract

"Thinking is for Doing." Humans can infer other people's mental states from observations--an ability called Theory-of-Mind (ToM)--and subsequently act pragmatically on those inferences. Existing question answering benchmarks such as ToMi ask models questions to make inferences about beliefs of characters in a story, but do not test whether models can then use these inferences to guide their actions. We propose a new evaluation paradigm for LLMs: Thinking for Doing (T4D), which requires models to connect inferences about others' mental states to actions in social scenarios. Experiments on T4D demonstrate that LLMs such as GPT-4 and PaLM 2 seemingly excel at tracking characters' beliefs in stories, but they struggle to translate this capability into strategic action. Our analysis reveals the core challenge for LLMs lies in identifying the implicit inferences about mental states without being explicitly asked about as in ToMi, that lead to choosing the correct action in T4D. To bridge this gap, we introduce a zero-shot prompting framework, Foresee and Reflect (FaR), which provides a reasoning structure that encourages LLMs to anticipate future challenges and reason about potential actions. FaR boosts GPT-4's performance from 50% to 71% on T4D, outperforming other prompting methods such as Chain-of-Thought and Self-Ask. Moreover, FaR generalizes to diverse out-of-distribution story structures and scenarios that also require ToM inferences to choose an action, consistently outperforming other methods including few-shot in-context learning.

We're not able to analyze this paper right now due to high demand.

Please check back later (sorry!).

Sign up for a free account or log in to generate a summary of this paper:

We ran into a problem analyzing this paper.

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.

References
  1. Jacob Andreas. Language models as agent models. In Findings of the Association for Computational Linguistics: EMNLP 2022, pp.  5769–5779, Abu Dhabi, United Arab Emirates, December 2022. Association for Computational Linguistics. https://aclanthology.org/2022.findings-emnlp.423.

  2. PaLM 2 Technical Report
  3. Mindcraft: Theory of mind modeling for situated dialogue in collaborative tasks. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp.  1112–1125
  4. Does the autistic child have a “theory of mind”? Cognition, 21(1):37–46
  5. Recognition of faux pas by normally developing children and children with asperger syndrome or high-functioning autism. Journal of autism and developmental disorders, 29:407–418
  6. Graph of Thoughts: Solving Elaborate Problems with Large Language Models
  7. Language Models are Few-Shot Learners
  8. Susan T Fiske. Thinking is for doing: portraits of social cognition from daguerreotype to laserphoto. Journal of personality and social psychology, 63(6):877
  9. Development and neurophysiology of mentalizing. Philosophical Transactions of the Royal Society of London. Series B: Biological Sciences, 358(1431):459–473
  10. A Real-World WebAgent with Planning, Long Context Understanding, and Program Synthesis
  11. Reasoning with Language Model is Planning with World Model
  12. A formal basis for the heuristic determination of minimum cost paths. IEEE transactions on Systems Science and Cybernetics, 4(2):100–107
  13. Large language models are zero-shot reasoners. Advances in neural information processing systems, 35:22199–22213
  14. Evaluating Large Language Models in Theory of Mind Tasks
  15. Revisiting the evaluation of theory of mind through question answering. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp.  5872–5877
  16. Dissociating language and thought in large language models
  17. Reframing Instructional Prompts to GPTk's Language
  18. Evaluating theory of mind in question answering. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp.  2392–2400
  19. OpenAI. Chatgpt: Optimizing language models for dialogue, 2022. https://openai.com/blog/chatgpt/.

  20. R OpenAI. Gpt-4 technical report. arXiv, pp.  2303–08774
  21. Generative Agents: Interactive Simulacra of Human Behavior
  22. Three-year-olds’ difficulty with false belief: The case for a conceptual deficit. British journal of developmental psychology, 5(2):125–137
  23. Does the chimpanzee have a theory of mind? Behavioral and brain sciences, 1(4):515–526
  24. Measuring and Narrowing the Compositionality Gap in Language Models
  25. Social IQa: Commonsense reasoning about social interactions. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp.  4463–4473, Hong Kong, China, 2019. Association for Computational Linguistics. doi: 10.18653/v1/D19-1454. https://aclanthology.org/D19-1454.

  26. Neural theory-of-mind? on the limits of social intelligence in large LMs. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pp.  3762–3780, Abu Dhabi, United Arab Emirates, December 2022. Association for Computational Linguistics. https://aclanthology.org/2022.emnlp-main.248.

  27. Toolformer: Language Models Can Teach Themselves to Use Tools
  28. Minding Language Models' (Lack of) Theory of Mind: A Plug-and-Play Multi-Character Belief Tracker
  29. Clever Hans or Neural Theory of Mind? Stress Testing Social Reasoning in Large Language Models
  30. How well do LLMs perform on faux pas tests? In Findings of the Association for Computational Linguistics: ACL 2023, pp.  10438–10451, Toronto, Canada, July 2023b. Association for Computational Linguistics. doi: 10.18653/v1/2023.findings-acl.663. https://aclanthology.org/2023.findings-acl.663.

  31. Interactive query-assisted summarization via deep reinforcement learning. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp.  2551–2568, Seattle, United States, July 2022. Association for Computational Linguistics. doi: 10.18653/v1/2022.naacl-main.184. https://aclanthology.org/2022.naacl-main.184.

  32. The consideration of future consequences: Weighing immediate and distant outcomes of behavior. Journal of personality and social psychology, 66(4):742
  33. Do large language models know what humans know? Cognitive Science, 47(7):e13309
  34. Large Language Models Fail on Trivial Alterations to Theory-of-Mind Tasks
  35. Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35:24824–24837
  36. Beliefs about beliefs: Representation and constraining function of wrong beliefs in young children’s understanding of deception. Cognition, 13(1):103–128
  37. Tree of Thoughts: Deliberate Problem Solving with Large Language Models
  38. ReAct: Synergizing reasoning and acting in language models. In International Conference on Learning Representations (ICLR), 2023b.
  39. Beyond Chain-of-Thought, Effective Graph-of-Thought Reasoning in Language Models
  40. Least-to-Most Prompting Enables Complex Reasoning in Large Language Models
  41. I cast detect thoughts: Learning to converse and guide with intents and theory-of-mind in dungeons and dragons. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp.  11136–11155, Toronto, Canada, July 2023. Association for Computational Linguistics. doi: 10.18653/v1/2023.acl-long.624. https://aclanthology.org/2023.acl-long.624.

Show All 41