Disentangling context-length effects from theory-of-mind demands in CharToM-QA
Determine whether the primary source of difficulty in answering questions in the CharToM-QA benchmark arises from processing long input contexts (novel-length passages exceeding 2,000 words) or from the theory-of-mind reasoning requirements of the questions themselves.
References
However, their questions are based on lengthy novels (over 2K words), making it unclear whether the challenge stems from context length or ToM understanding.
— Are LLMs Smarter Than Chimpanzees? An Evaluation on Perspective Taking and Knowledge State Estimation
(2601.12410 - Yang et al., 18 Jan 2026) in Section 2, Related Works — Theory of Mind (ToM) in LLMs