SimToM: Simulated Theory-of-Mind Algorithms

Updated 10 February 2026

SimToM is a computational framework that simulates an agent’s perspective by isolating accessible information and inferring latent mental states.
It employs a two-stage process, separating perspective-taking from targeted mental-state inference for more accurate Theory-of-Mind tasks.
The approach enhances performance on false-belief benchmarks and agentic applications without requiring additional model retraining.

Simulated Theory-of-Mind (SimToM) refers to a class of algorithms and agent architectures that equip machine learning systems—particularly LLMs—with the capability to simulate the mental states of human or artificial agents. Drawing directly on Simulation Theory from cognitive science, SimToM methods explicitly model an agent’s beliefs, constraints, and preferences through “perspective-taking” before answering queries or taking actions. These methods operationalize Theory-of-Mind (ToM) reasoning in LLMs or agentic frameworks and enable enhanced interpretation of user intent, context-driven reasoning, and interpersonal prediction—even in underspecified or ambiguous scenarios (Wilf et al., 2023, Sarangi et al., 15 Jan 2025, Zhou et al., 24 Oct 2025).

1. Theoretical Foundations and Motivation

Simulation Theory posits that humans understand others’ mental states by simulating them internally: first adopting the other’s perspective (perspective-taking), then inferring beliefs or intentions from within that simulated viewpoint. This contrasts with “Theory-Theory,” which infers mental states via abstract rules or general knowledge. In the computational context, SimToM enables models to decompose ToM tasks into a perspective-filtering phase—isolating only knowledge available to the agent in question—and a reasoning phase, allowing more human-like inference of latent goals and beliefs (Wilf et al., 2023).

SimToM approaches are motivated by the observation that standard zero-shot and chain-of-thought (CoT) prompting often fail on canonical ToM tasks (notably, false-belief scenarios), as they collapse both perspective-taking and mental-state inference into a single, undifferentiated step (Wilf et al., 2023, Sarangi et al., 15 Jan 2025). By explicitly separating these phases, SimToM produces substantial improvements in both benchmarks and interactive agent applications.

2. SimToM: Prompt-Based Two-Stage ToM Inference

SimToM implementations share a core methodological pattern:

Perspective-Taking (Simulation Stage): The model is prompted to reproduce the scenario or sequence of events from the viewpoint of the target agent, displaying only information the agent would know.
Mental-State Inference (Question-Answering Stage): The model answers the ToM query based on the filtered, agent-specific context.

In formal notation (Wilf et al., 2023):

Let $S = \{e_1, e_2, ..., e_n\}$ denote a sequence of events, $a$ the agent, and $q$ the ToM query.
The agent-specific knowledge is $\mathcal{K}_a(S) = \{ e_i \in S : a \text{ is aware of } e_i \}$ .
Stage 1: $K \leftarrow LLM_{\mathrm{P1}}(S, a) \approx \mathcal{K}_a(S)$ .
Stage 2: $answer \leftarrow LLM_{\mathrm{P2}}(K, q)$ .

Prompt templates operationalize these stages:

Stage 1: "The following is a sequence of events: {story}. Which events does {character} know about?"
Stage 2: "{filtered episode}. Answer the following question: {ToM query}."

No gradient updates or model retraining are required; prompt-tuning suffices. Evaluation typically uses multiple-choice accuracy on standardized ToM datasets.

3. Algorithmic Blueprint and Variants

A canonical SimToM implementation involves the following subroutines (Sarangi et al., 15 Jan 2025):

Subroutine	Functionality
GET_AGENT	Parse query to extract target (“outermost”) agent
REPHRASE_Q	Reframe question for agent’s perspective
SIMULATE_PERSPECTIVE	Prompt: restate only events known to agent
ANSWER_FROM_STORY	Prompt: answer the ToM question with agent-view story

Pseudocode:

agent = GET_AGENT(question)
new_q = REPHRASE_Q(agent, question)
sim_story = SIMULATE_PERSPECTIVE(agent, story)
answer = ANSWER_FROM_STORY(sim_story, new_q, choices)
return answer

This approach is highly modular, requires no additional model training, and generalizes well to first-order and some second-order ToM tasks. It is, however, not inherently recursive: for nested ToM (e.g., "Where does A think B thinks...?"), SimToM only simulates the outermost (A’s) perspective (Sarangi et al., 15 Jan 2025).

SimToM-inspired modules are also incorporated into agent frameworks such as ToM-SWE, where a dedicated ToM agent persistently infers and maintains the user’s latent state across engineering sessions, representing goals ( $g$ ), constraints ( $C$ ), and preferences ( $P$ ) with evolving hierarchical memory structures (Zhou et al., 24 Oct 2025).

4. Evaluation Benchmarks and Quantitative Results

Theory-of-Mind Story Tasks

Standard ToM benchmarks include Sally–Anne style scenarios (ToMI), naturalistic false-belief narratives (BigTOM), and conversationally embedded ToM tasks (FANToM, Hi-ToM). Quantitative comparisons demonstrate that SimToM yields significant accuracy gains over zero-shot and CoT prompting baselines.

Model	BigTOM FB	Δ vs 0-shot/CoT	ToMI FB	Δ vs 0-shot/CoT
Llama2-7b-chat	70.5%	+23.0 / +39.0	40.0%	+11.8 / +16.0
GPT-3.5-Turbo	70.5%	+29.5 / +14.2	81.0%	+13.8 / +47.0
GPT-4	92.0%	+3.0 / –1.2	87.8%	+62.2/ +13.5

On Hi-ToM and FANToM, SimToM improves over MC and CoT baselines, but fails to match recursive approaches like Decompose-ToM on higher-order tasks (Sarangi et al., 15 Jan 2025).

Model	Hi-ToM 1st	2nd	3rd	4th	FANToM-Short	Long
MC Baseline	60.0	40.8	22.5	21.7	55.0	44.0
CoT	74.2	53.3	45.0	42.5	74.0	64.7
SimToM	75.0	55.0	48.3	43.3	90.4	84.7
Decompose-ToM	76.7	86.7	87.5	83.3	88.4	86.2

Agentic ToM in Software Engineering

In ToM-SWE (Zhou et al., 24 Oct 2025), SimToM-style user modeling boosts both codebench success rates and user satisfaction:

Agent	Ambiguous SWE	Stateful SWE	User Satisfaction
CodeAct	51.9%	13.5%	2.57
+RAG	56.0%	18.7%	3.09
+ToM (SimToM)	63.4%	57.4%	3.62

These results validate the practical significance of simulated perspective-taking and persistent user modeling.

5. Analyses, Strengths, and Limitations

Strengths:

Substantial improvements for first-order and simple second-order ToM tasks.
Modular, prompt-based, and training-free, facilitating adoption across LLM backends and domains.
Demonstrated value in agentic workflows for software engineering, with persistent user state inference and stateful adaptation (Zhou et al., 24 Oct 2025).

Limitations:

No inherent recursion; performance degrades for higher-order nested belief tasks (Sarangi et al., 15 Jan 2025).
Filtering/hiding events does not generalize to settings requiring inference of truly unseen or hypothetical states (Wilf et al., 2023).
Relies on LLMs’ latent ToM-relevant knowledge, possibly limiting applicability to smaller models or non-instruction-tuned architectures.
Evaluations to date focus on structured, stylized benchmarks; generalization to real-world, open-domain, or adversarial ToM remains an open area.

Notable ablation findings:

Merging both stages into a single prompt degrades performance to near-CoT levels.
Optimal perspective-taking (oracle-labeled context) enables close to perfect accuracy, indicating that context isolation is a key bottleneck for contemporary LLMs.

6. Extensions, Applications, and Recommendations

SimToM’s dual-stage, perspective-driven pattern is being extended across domains:

In agent frameworks, SimToM-inspired ToM agents persistently encode individual user goals and preferences as structured memory, facilitating user-aligned interactions (Zhou et al., 24 Oct 2025).
In conversational settings, recursive extensions (e.g., Decompose-ToM) decompose complex queries across multiple agent points of view (Sarangi et al., 15 Jan 2025).
Practical recommendations emphasize separating perspective-taking from task execution, using hierarchical user profiles, and prompt templates guiding structured extraction of goals, constraints, and preferences.

Recommendations for generalizing SimToM (Zhou et al., 24 Oct 2025):

Maintain clear separation between “doer” and “thinker” modules (dual-agent architecture).
Structure memory for persistent user modeling (raw session logs → session summaries → global profiles).
Balance context retrieval and model cost, leveraging small LLMs for SimToM wherever possible.

7. Future Directions

Research highlights several promising directions:

Building and evaluating on lifelike, open-domain ToM benchmarks with richer, less stylized mental-state scenarios.
Systematic analysis of LLM failures: hallucination versus inference trade-offs, especially in ambiguous/missing data contexts.
Integrating world model updates and symbolic reasoning for higher-order and group Theory-of-Mind.
Enhancing SimToM with domain-specific few-shot prompt examples for specialized deployment.
Expansion into new agent domains—creative writing, data analysis, education—by adapting and extending the SimToM dual-agent template and memory hierarchy (Zhou et al., 24 Oct 2025).

A plausible implication is that explicit, modular simulation of other agents’ perspectives within LLM architectures will continue to be central to progress on machine Theory-of-Mind, with SimToM methods providing a baseline pattern for future development and adaptation (Wilf et al., 2023, Sarangi et al., 15 Jan 2025, Zhou et al., 24 Oct 2025).

Markdown Upgrade to Chat

References (3)

Think Twice: Perspective-Taking Improves Large Language Models' Theory-of-Mind Capabilities (2023)

Decompose-ToM: Enhancing Theory of Mind Reasoning in Large Language Models through Simulation and Task Decomposition (2025)

TOM-SWE: User Mental Modeling For Software Engineering Agents (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Simulated-ToM (SimToM).

SimToM: Simulated Theory-of-Mind Algorithms

1. Theoretical Foundations and Motivation

2. SimToM: Prompt-Based Two-Stage ToM Inference

3. Algorithmic Blueprint and Variants

4. Evaluation Benchmarks and Quantitative Results

Theory-of-Mind Story Tasks

Agentic ToM in Software Engineering

5. Analyses, Strengths, and Limitations

6. Extensions, Applications, and Recommendations

7. Future Directions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research

SimToM: Simulated Theory-of-Mind Algorithms

1. Theoretical Foundations and Motivation

2. SimToM: Prompt-Based Two-Stage ToM Inference

3. Algorithmic Blueprint and Variants

4. Evaluation Benchmarks and Quantitative Results

Theory-of-Mind Story Tasks

Agentic ToM in Software Engineering

5. Analyses, Strengths, and Limitations

6. Extensions, Applications, and Recommendations

7. Future Directions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research