Emulation vs. Emergent Convergence of ToM–Emotion Representations in LLMs

Determine whether the shared internal representations that link perspective-taking (Theory of Mind) and emotional context processing in large language models trained via language modeling objectives constitute genuine emulation of cognitive architecture or instead reflect emergent convergence on functionally equivalent representations.

Background

The paper analyzes how Contrastive Activation Addition (CAA) steering improves Theory of Mind (ToM) performance in Gemma-3-4B by examining changes detected by linear probes trained on 45 cognitive actions. The authors find that improved ToM performance aligns with increased emotional and generative processes and decreased analytical processes, suggesting that emotional understanding mediates successful perspective-taking in the evaluated scenarios.

Connecting these findings to neuroscience results that cognitive and affective ToM can share neural substrates, the authors hypothesize that LLMs may learn shared representations linking perspective-taking with emotional context processing during training. They explicitly note that it remains an open question whether such shared representations reflect genuine emulation of cognitive architecture or merely an emergent convergence on functionally equivalent mechanisms.

References

During language modeling, networks may learn shared representations linking perspective-taking with emotional context processing, mirroring the compressed structure of human social cognition embedded in linguistic data. Whether this constitutes genuine emulation of cognitive architecture or emergent convergence on functionally equivalent representations remains an open question.

— Decomposing Theory of Mind: How Emotional Processing Mediates ToM Abilities in LLMs (2511.15895 - Chulo et al., 19 Nov 2025) in Discussion, Section "Discussion"

Emulation vs. Emergent Convergence of ToM–Emotion Representations in LLMs

Background

References

Related Problems