Mutual Theory of Mind
- Mutual Theory of Mind is defined as a recursive, bidirectional process in which agents continuously update explicit models of each other’s mental states.
- Computational models leverage recursive Bayesian updates and bounded rational hierarchies to achieve improved prediction and coordination in multi-agent tasks.
- Empirical research demonstrates practical benefits in human-AI collaboration, embodied robotics, and multi-modal social interactions.
Mutual Theory of Mind (mutual ToM, MToM) refers to the interactive, recursive, and bidirectional process by which two or more agents—human, artificial, or both—construct, update, and act upon explicit models of each other’s mental states (beliefs, desires, intentions, and perceptions). Unlike classical, one-sided Theory of Mind (ToM) which evaluates whether a single agent can attribute mental states to another, mutual ToM studies the continuous, co-adaptive dynamics between cognitive models maintained by all parties and places primary emphasis on the emergent properties of interaction rather than isolated performance. This paradigm is increasingly operationalized in models and experiments for multi-agent AI, human-AI collaboration, embodied robotics, and computational social cognition (Yin et al., 3 Oct 2025, Wang et al., 2022, Shi et al., 2024, Zhang et al., 2024, Zhang et al., 28 Nov 2025).
1. Foundational Concepts and Formal Definition
Mutual Theory of Mind is characterized by recursive and coupled modeling: at each timestep, every agent maintains a belief or a structured representation over the latent mental state of another agent , while simultaneously constructing (or updating) higher-order beliefs about what that agent believes about them, and so on. The core formalization is recursive, for example: where actions and observations () mediate updates (Yin et al., 3 Oct 2025, Wang et al., 2022). A Bayesian belief update variant is frequently employed: where is 's belief about 's state (Yin et al., 3 Oct 2025).
In multi-agent settings (I-POMDPs), each policy and update step is recursively dependent on the history and on estimated higher-order goals, e.g. beliefs about others' beliefs (Shi et al., 2024). This recursion is operationalized either to finite depth or via compressed/proxy updates to maintain computability (Zhu et al., 27 Nov 2025).
2. Simulation, Experience, and the Scope of Mental State Modeling
A central distinction in mutual ToM research is between simulation—statistical or behavioral mimicry—and genuine cognitive experience. Pretrained LLMs, vision-LLMs, and RL agents can attain high accuracy on isolated ToM tasks primarily through pattern-matching learned from extensive data. However, such performance reflects simulation (predictive modeling of observed behavior) without embodiment, motivational drive, or first-person affect (Yin et al., 3 Oct 2025). Consequently, mutual ToM calls for dynamic, embodied, and interaction-based evaluation and modeling, foregrounding real-time adaptation and continuous feedback rather than static, third-party tests (Yin et al., 3 Oct 2025, Wang et al., 2022).
3. Formal Architectures and Algorithmic Implementations
Numerous computational models instantiate mutual ToM:
- Recursive Bayesian Belief Models: Recursive updating of agent i’s beliefs about other agents’ hidden states, sometimes including beliefs over the other’s beliefs (first- and second-order) (Shi et al., 2024, Zhu et al., 27 Nov 2025, Yuan et al., 2021). For example, in Multi-modal Multi-Agent ToM and Hanabi cooperation, recursive reasoning is truncated at one or two levels for tractability (Shi et al., 2024, Montes et al., 2022).
- Finite/Bounded Rational Hierarchies: Cognitive hierarchy models (Poisson-Gamma priors) and best-response solvers compute mutual ToM at arbitrary but finite depth by maintaining explicit distributions over opponents’ types (ToM-levels) and updating them Bayesianly (Zhu et al., 27 Nov 2025). Each agent reasons prescriptively over its opponents' ToM depth truncated to some bound.
- BDI Hierarchies and Perspective-Tagged Action Streams: Embodied agents (e.g., MindPower) process multimodal inputs through chained modules—Perception, Mental Reasoning (including both self and other's BDI), Decision, and Action—explicitly marking the “perspective” of each atomic action and recursively generating predictions of “what I think the human believes” (Zhang et al., 28 Nov 2025).
- Active Inference with Coupled Generative Models: Each agent maintains generative models over both its own and the other's hidden variables, using free energy minimization to recursively update beliefs and plan under uncertainty (with alternating tree-search over policy profiles) (Pitliya et al., 1 Aug 2025).
- Multi-agent RL with Intrinsic Mutual ToM Rewards: Agents jointly learn interpretable beliefs and policies; second-order belief predictions (predicting what others believe) are used as intrinsic rewards, directly motivating more accurate and interpretable social modeling (Oguntola et al., 2023).
4. Experimental Paradigms, Evaluation Metrics, and Empirical Insights
Modern mutual ToM paradigms emphasize ecological validity and dynamic interaction. Task design frequently relies on:
| Task Paradigm | Key Evaluative Metrics | References |
|---|---|---|
| Real-time collaborative workspaces | Prediction accuracy, adaptation latency, team performance, subjective trust | (Yin et al., 3 Oct 2025, Zhang et al., 2024) |
| Multi-modal household tasks | Belief/goal inference accuracy, joint planning log-likelihood | (Shi et al., 2024) |
| Multi-agent games (Hanabi, Overcooked) | Team score, communication efficiency, uniqueness of strategies, belief prediction | (Montes et al., 2022, Lim et al., 2020, Yuan et al., 2021) |
| RL benchmarks with ToM objectives | Episodic reward, belief prediction loss, second-order accuracy | (Oguntola et al., 2023, Zhang et al., 28 Nov 2025) |
| Human-AI communication studies | Subjective measures (trust, fluency), behavioral adherence to predicted model | (Wang et al., 2022, Yin et al., 3 Oct 2025) |
Quantitative results consistently demonstrate that even first- or second-order mutual ToM architectures dramatically improve cooperation, coordination efficiency, adaptability, and subjective experience (e.g., “feeling understood”). MindPower, for instance, achieves +12.49% improvement in action generation over GPT-4o baselines (Zhang et al., 28 Nov 2025), while multi-modal, multi-agent LIMP recovers most of the human–machine performance gap by leveraging explicit mutual ToM (Shi et al., 2024). However, in shared workspace studies, increased explicit communication can decrease objective team performance, highlighting the complexity of overloading the communication channel in real time (Zhang et al., 2024).
5. Human–AI Mutuality: Trust, Reliance, and Divergence
In educational and collaborative contexts, mutual ToM models account for epistemic trust and behavioral reliance as distinct, orthogonally-driven outcomes derived from a common mental model. In studies with graduate students, trust (judgment of correctness/competence) may be higher for human experts, while reliance (actual usage) may be higher for AI due to social affordance (anonymity, accessibility) (Pal et al., 23 Jan 2026). This dissociation arises from the dual-pathway structure of mutual mental models and has implications for system design and trust calibration interventions, advocating for a decoupling of trust and reliance management and recommendations for scaffolds, transparency, and bridge mechanisms between AI and human help channels (Pal et al., 23 Jan 2026).
6. Limitations, Computational Trade-Offs, and Design Recommendations
The operationalization of mutual ToM faces challenges of combinatorial explosion in recursive belief modeling, with most frameworks limiting explicit recursion to one or two levels for computational tractability (Shi et al., 2024, Zhu et al., 27 Nov 2025, Montes et al., 2022). In practice, truncated hierarchies, one-step belief-over-belief updates, and approximate Bayesian/MDP methods are standard. Other key limitations include:
- Generalization from synthetic tasks to real-world contexts (mobility, open-ended dialog, diverse user bases) is not yet fully demonstrated (Shi et al., 2024, Zhang et al., 28 Nov 2025).
- Most current systems capture beliefs, desires, and intentions, but rarely engage with affective or motivational dynamics central to embodied human ToM (Yin et al., 3 Oct 2025).
- Benchmarks typically involve two agents; scaling to multi-agent settings with more than two participants and richer social modalities (gaze, gesture, physiological cues) is a prominent direction (Shi et al., 2024, Wang et al., 2022).
Best-practices for MToM system design and deployment include: dual transparency (exposing the system’s knowledge state), tunable proactivity, explainable-by-interaction modules, domain-boundary signifiers, and mechanisms for maintenance and revision of shared mental models through ongoing interaction and feedback (Weisz et al., 2024, Yin et al., 3 Oct 2025, Wang et al., 2022).
7. Open Challenges and Prospective Research Frontiers
Immediate open problems in mutual ToM research are:
- Developing efficient, scalable algorithms for higher-order and multi-party mutual ToM, possibly using compressive or sampling-based approximations (Zhu et al., 27 Nov 2025).
- Integrating richer multi-modal signals (e.g., nonverbal behavior, speech prosody) and affective reasoning (Bortoletto et al., 2024, Wang et al., 2022).
- Extending architectures from cooperative to mixed-motive (competitive/collaborative) and real-world multi-domain environments.
- Elucidating the longitudinal dynamics of mutual model adaptation, as found in long-term educational interactions or workplace teams (Wang et al., 2022).
- Articulating normative and ethical boundaries for how much mutual model transparency and adaptation should be supported—particularly in light of privacy, manipulation, and over-reliance risks (Pal et al., 23 Jan 2026, Weisz et al., 2024).
By reframing Theory of Mind as inherently mutual, adaptive, and interaction-driven, research in mutual ToM is enabling new classes of algorithms and empirical studies that more faithfully capture the complexities of real-world social cognition and collaboration between humans and AI systems (Yin et al., 3 Oct 2025).