Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Notion of Complexity for Theory of Mind via Discrete World Models (2406.11911v3)

Published 16 Jun 2024 in cs.AI, cs.CL, and cs.LG
A Notion of Complexity for Theory of Mind via Discrete World Models

Abstract: Theory of Mind (ToM) can be used to assess the capabilities of LLMs in complex scenarios where social reasoning is required. While the research community has proposed many ToM benchmarks, their hardness varies greatly, and their complexity is not well defined. This work proposes a framework inspired by cognitive load theory to measure the complexity of ToM tasks. We quantify a problem's complexity as the number of states necessary to solve it correctly. Our complexity measure also accounts for spurious states of a ToM problem designed to make it apparently harder. We use our method to assess the complexity of five widely adopted ToM benchmarks. On top of this framework, we design a prompting technique that augments the information available to a model with a description of how the environment changes with the agents' interactions. We name this technique Discrete World Models (DWM) and show how it elicits superior performance on ToM tasks.

A Notion of Complexity for Theory of Mind via Discrete World Models

The paper introduces a novel framework for evaluating Theory of Mind (ToM) tasks' complexity, specifically in the context of assessing LLMs. The significance of this framework lies in its ability to quantify ToM tasks' complexity by considering the "state events" or changes in states that need tracking, which is particularly valuable given the current lack of a unified method to define such complexity for ToM tasks.

Summary and Contributions

  1. Complexity Framework: The paper presents a framework that defines ToM task complexity based on the number of state transitions or state events required to solve the task. This measure separates relevant state changes (stateful) from irrelevant ones (stateless), avoiding unnecessary complexity.
  2. Task Evaluation: The authors applied their framework to five common ToM benchmarks, revealing substantial variances in their complexities. The framework effectively identifies which tasks are inherently more demanding for LLMs, beyond superficial complexity or spurious task difficulty.
  3. Discrete World Models (DWM): A novel prompting technique called Discrete World Models (DWM) is developed, inspired by dividing tasks into discrete state sequences similar to least-to-most prompting and Tree-of-Thought (ToT) methodologies. By enhancing model prompts with detailed descriptions of how the environment changes, DWM achieves superior performance in complicated ToM tasks, surpassing previous approaches like Chain of Thought (CoT) prompting or ToT.
  4. Empirical Evaluation: The effectiveness of DWM is empirically validated across a spectrum of LLMs, including GPT-3.5, GPT-4, and others, demonstrating improved accuracy in difficult ToM problems, particularly those with a well-defined state space.

Theoretical and Practical Implications

The framework contributes theoretically to understanding ToM tasks by objectively quantifying their complexity through state event tracking, an approach that encourages transparency and robustness in future ToM benchmarks. Practically, the introduction of DWM illustrates an effective tool for enhancing LLM reasoning capabilities in tasks requiring deep social cognition.

Future Developments in AI

This work paves the way for further development in AI systems' social reasoning capabilities, suggesting that incorporating explicit state tracking in prompts can significantly close the gap between human and machine capabilities in ToM problems. Moreover, it lays the foundation for developing more sophisticated LLM behavioral models, potentially integrating dynamic epistemic logic or other formal reasoning methods to further bridge this gap.

Conclusion

The proposed framework and the DWM technique represent important steps toward refining our assessment of ToM capabilities in AI, demonstrating that even complex social reasoning problems can be systematically approached and improved through targeted input reorganization and structured prompting methods. This work highlights a crucial intersection of cognitive modeling and machine learning, providing insights and tools applicable to broader AI research domains, including autonomous systems and human-AI interaction.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. X. Angelo Huang (3 papers)
  2. Emanuele La Malfa (21 papers)
  3. Samuele Marro (11 papers)
  4. Andrea Asperti (25 papers)
  5. Anthony Cohn (5 papers)
  6. Michael Wooldridge (59 papers)
Citations (1)
X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

Youtube Logo Streamline Icon: https://streamlinehq.com