Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Neural Theory-of-Mind? On the Limits of Social Intelligence in Large LMs (2210.13312v2)

Published 24 Oct 2022 in cs.CL and cs.AI

Abstract: Social intelligence and Theory of Mind (ToM), i.e., the ability to reason about the different mental states, intents, and reactions of all people involved, allow humans to effectively navigate and understand everyday social interactions. As NLP systems are used in increasingly complex social situations, their ability to grasp social dynamics becomes crucial. In this work, we examine the open question of social intelligence and Theory of Mind in modern NLP systems from an empirical and theory-based perspective. We show that one of today's largest LLMs (GPT-3; Brown et al., 2020) lacks this kind of social intelligence out-of-the box, using two tasks: SocialIQa (Sap et al., 2019), which measures models' ability to understand intents and reactions of participants of social interactions, and ToMi (Le et al., 2019), which measures whether models can infer mental states and realities of participants of situations. Our results show that models struggle substantially at these Theory of Mind tasks, with well-below-human accuracies of 55% and 60% on SocialIQa and ToMi, respectively. To conclude, we draw on theories from pragmatics to contextualize this shortcoming of LLMs, by examining the limitations stemming from their data, neural architecture, and training paradigms. Challenging the prevalent narrative that only scale is needed, we posit that person-centric NLP approaches might be more effective towards neural Theory of Mind. In our updated version, we also analyze newer instruction tuned and RLFH models for neural ToM. We find that even ChatGPT and GPT-4 do not display emergent Theory of Mind; strikingly even GPT-4 performs only 60% accuracy on the ToMi questions related to mental states and realities.

Neural Theory-of-Mind? On the Limits of Social Intelligence in Large LMs

This paper by Sap et al. examines the capacity of LLMs, specifically focusing on GPT-3, to demonstrate social intelligence, particularly Theory of Mind (ToM). ToM entails the ability to infer and reason about the mental states, intentions, and reactions of others—a crucial capability for navigating social interactions effectively. This research addresses the ongoing question of whether modern NLP systems can apprehend social dynamics and reflect abilities equivalent to human social cognition.

The authors utilize two specific benchmarks to evaluate social understanding: SocialIQa, assessing commonsense social reasoning, and ToMi, evaluating models' processing of mental states and realities. The paper demonstrates that LLMs struggle significantly with these tasks, as exemplified by GPT-3's performance. When tasked with SocialIQa, GPT-3's accuracy peaks at 55%, well below human performance. In the ToMi benchmark, the model's highest accuracy on questions requiring mental state inference is 60%, again markedly lower than humans.

The paper attributes these limitations of LLMs in social intelligence to several underlying issues in their training paradigms. Despite these models being impressive in generating human-like text and responding in conversational contexts, they often fail in higher-order reasoning, potentially due to their reliance on spurious correlations rather than genuine cognitive processing akin to human reasoning.

The authors also challenge the dominant narrative that merely scaling model size leads to improved performance in these social reasoning tasks. They argue that simply increasing model size is insufficient for achieving neural Theory of Mind, proposing instead that addressing the shortcomings may require more person-centric approaches or training paradigms that prioritize social context.

The implications of this work are significant for the practical deployment of AI systems in socially complex environments. The results suggest that while LLMs excel in certain tasks, there are inherent limitations when these models are asked to approximate human levels of empathy, social reasoning, or understanding. This insight necessitates further exploration into the training data and methodologies used to imbue models with social intelligence.

Future developments in AI may benefit from integrating more contextually rich and interaction-focused datasets, augmenting the models with person-centric architectures, and possibly combining these neural models with frameworks capable of experiential learning. As such, while the current capabilities are far from human-like ToM, adjusting the underlying approaches to training and model architecture might hold promise for more advanced, socially adept AI systems. This calls for a renewed focus on creating AI technologies that better emulate the complex nuances of human social reasoning.

Ultimately, this paper adds to the discussion of whether and how AI systems can achieve what is inherently a human capability—understanding complex social interactions and the myriad of mental states that accompany them. It provides an empirically grounded perspective on the deficiencies of current models, pointing towards future research trajectories that might bridge the gap between human and machine understanding in social contexts.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Maarten Sap (86 papers)
  2. Ronan LeBras (4 papers)
  3. Daniel Fried (69 papers)
  4. Yejin Choi (287 papers)
Citations (175)
X Twitter Logo Streamline Icon: https://streamlinehq.com
Youtube Logo Streamline Icon: https://streamlinehq.com