Language Models Represent Beliefs of Self and Others (2402.18496v3)

Published 28 Feb 2024 in cs.AI and cs.CL

Abstract: Understanding and attributing mental states, known as Theory of Mind (ToM), emerges as a fundamental capability for human social reasoning. While LLMs appear to possess certain ToM abilities, the mechanisms underlying these capabilities remain elusive. In this study, we discover that it is possible to linearly decode the belief status from the perspectives of various agents through neural activations of LLMs, indicating the existence of internal representations of self and others' beliefs. By manipulating these representations, we observe dramatic changes in the models' ToM performance, underscoring their pivotal role in the social reasoning process. Additionally, our findings extend to diverse social reasoning tasks that involve different causal inference patterns, suggesting the potential generalizability of these representations.

PDF HTML Abstract

Summarize Bookmark Chat (Pro)

References (56)

Authors (3)

Wentao Zhu (73 papers)
Zhining Zhang (2 papers)
Yizhou Wang (162 papers)

Citations (5)

View on Semantic Scholar

Tweets

https://twitter.com/BogdanIonutCir2/status/1763717916082466945

https://twitter.com/walterzhu8/status/1763402642833768689

https://twitter.com/walterzhu8/status/1763575996853387640

https://twitter.com/eschatail/status/1899650195006050534

Language Models Represent Beliefs of Self and Others (2402.18496v3)

Related Papers

Tweets