Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Mind Your Theory: Theory of Mind Goes Deeper Than Reasoning (2412.13631v1)

Published 18 Dec 2024 in cs.AI and cs.CL
Mind Your Theory: Theory of Mind Goes Deeper Than Reasoning

Abstract: Theory of Mind (ToM) capabilities in LLMs have recently become a central object of investigation. Cognitive science distinguishes between two steps required for ToM tasks: 1) determine whether to invoke ToM, which includes the appropriate Depth of Mentalizing (DoM), or level of recursion required to complete a task; and 2) applying the correct inference given the DoM. In this position paper, we first identify several lines of work in different communities in AI, including LLM benchmarking, ToM add-ons, ToM probing, and formal models for ToM. We argue that recent work in AI tends to focus exclusively on the second step which are typically framed as static logic problems. We conclude with suggestions for improved evaluation of ToM capabilities inspired by dynamic environments used in cognitive tasks.

An Expert Overview of "Mind Your Theory: Theory of Mind Goes Deeper Than Reasoning"

The paper "Mind Your Theory: Theory of Mind Goes Deeper Than Reasoning" offers a substantial contribution to the investigation of Theory of Mind (ToM) capabilities in LLMs by addressing the multifaceted nature of ToM tasks. The authors emphasize the critical distinction between two integral steps within ToM processes: the initial determination of whether to invoke ToM—encompassing the Depth of Mentalizing (DoM) or the requisite level of recursion—and the subsequent application of correct inferences upon establishing the DoM.

Current State of ToM Investigation in AI

Recent AI research has extensively explored LLMs' abilities to engage in social cognition, positioning ToM as a central aspect of these capabilities. The paper delineates various strands of work within the AI community related to ToM: benchmarking, injection of ToM capabilities, probing models for ToM, and the development of formal ToM architectures. The prevailing focus in AI research, the authors argue, has been disproportionately aimed at the reasoning aspect—interpreted typically as solving static logic problems—while inadequately addressing the nuanced, real-time judgment required for ToM invocation. This oversight potentially leads researchers to conflate failures in reasoning tasks with an inability to invoke appropriate ToM strategies.

Theoretical and Practical Implications

The theoretical contribution of the paper is two-fold. First, it identifies a gap between cognitive science and AI research concerning the invocation phase of ToM, proposing that LLMs could be enhanced by adopting dynamic evaluation methods that are more aligned with cognitive science practices. Second, by formalizing the distinction between ToM invocation and inference, the paper provides a framework that clarifies the misconceptions around performance assessments of LLMs on ToM benchmarks.

Practically, the authors argue for enhancing current benchmarks with a focus on dynamic, interactive environments that more authentically reflect how ToM occurs within human cognitive processes. The significance of the self-other distinction in ToM modeling is underscored, pointing out that improvements in capturing and efficiently processing this aspect could enhance the trustworthiness and efficacy of AI systems in social contexts. The paper also highlights the potential for improving computational efficiency within LLMs through a more precise understanding of ToM, which might allow LLMs to handle tasks with comparable efficacy but reduced computational demand.

Numerical Results and Contradictory Claims

Numerical evaluations from existing literature indicate contrasting results when benchmarking LLMs for ToM capabilities, particularly when altering task structures. The authors illustrate that while LLMs like GPT-3 exhibit some difficulties with established tasks like ToMi, more complex models claim varied successes, although these claims are challenged by alternate tests which disrupt task structures. Such disparities underline the importance of task design and the clear separation of reasoning and inference factors in scholarly assessments.

Speculations for AI Developments

The authors foresee advancements in AI that could stem from research in cognitive and social sciences. Specifically, the move towards real-time interactive benchmarks, which would test both the inference and invocation phases of ToM, is implied as a necessary future step for developing LLMs that operate with a more human-like understanding of social contexts. Furthermore, the introduction of adaptive mentalizing—where models adjust their mentalizing depth based on context—could promote more nuanced AI-human interactions.

In summary, this paper makes a compelling case for the need to broaden the current scope of AI research into Theory of Mind by integrating insights from cognitive science into model evaluation, test environments, and efficiency benchmarking. It points to the necessity of revisiting and refining both current benchmarks and methodologies to adequately capture and assess the sophisticated interplay of reasoning and mental state attribution inherent in human-like ToM.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Eitan Wagner (8 papers)
  2. Nitay Alon (5 papers)
  3. Joseph M. Barnby (2 papers)
  4. Omri Abend (75 papers)