Papers
Topics
Authors
Recent
Search
2000 character limit reached

SimToM: Simulated Theory-of-Mind Algorithms

Updated 10 February 2026
  • SimToM is a computational framework that simulates an agent’s perspective by isolating accessible information and inferring latent mental states.
  • It employs a two-stage process, separating perspective-taking from targeted mental-state inference for more accurate Theory-of-Mind tasks.
  • The approach enhances performance on false-belief benchmarks and agentic applications without requiring additional model retraining.

Simulated Theory-of-Mind (SimToM) refers to a class of algorithms and agent architectures that equip machine learning systems—particularly LLMs—with the capability to simulate the mental states of human or artificial agents. Drawing directly on Simulation Theory from cognitive science, SimToM methods explicitly model an agent’s beliefs, constraints, and preferences through “perspective-taking” before answering queries or taking actions. These methods operationalize Theory-of-Mind (ToM) reasoning in LLMs or agentic frameworks and enable enhanced interpretation of user intent, context-driven reasoning, and interpersonal prediction—even in underspecified or ambiguous scenarios (Wilf et al., 2023, Sarangi et al., 15 Jan 2025, Zhou et al., 24 Oct 2025).

1. Theoretical Foundations and Motivation

Simulation Theory posits that humans understand others’ mental states by simulating them internally: first adopting the other’s perspective (perspective-taking), then inferring beliefs or intentions from within that simulated viewpoint. This contrasts with “Theory-Theory,” which infers mental states via abstract rules or general knowledge. In the computational context, SimToM enables models to decompose ToM tasks into a perspective-filtering phase—isolating only knowledge available to the agent in question—and a reasoning phase, allowing more human-like inference of latent goals and beliefs (Wilf et al., 2023).

SimToM approaches are motivated by the observation that standard zero-shot and chain-of-thought (CoT) prompting often fail on canonical ToM tasks (notably, false-belief scenarios), as they collapse both perspective-taking and mental-state inference into a single, undifferentiated step (Wilf et al., 2023, Sarangi et al., 15 Jan 2025). By explicitly separating these phases, SimToM produces substantial improvements in both benchmarks and interactive agent applications.

2. SimToM: Prompt-Based Two-Stage ToM Inference

SimToM implementations share a core methodological pattern:

  1. Perspective-Taking (Simulation Stage): The model is prompted to reproduce the scenario or sequence of events from the viewpoint of the target agent, displaying only information the agent would know.
  2. Mental-State Inference (Question-Answering Stage): The model answers the ToM query based on the filtered, agent-specific context.

In formal notation (Wilf et al., 2023):

  • Let S={e1,e2,...,en}S = \{e_1, e_2, ..., e_n\} denote a sequence of events, aa the agent, and qq the ToM query.
  • The agent-specific knowledge is Ka(S)={eiS:a is aware of ei}\mathcal{K}_a(S) = \{ e_i \in S : a \text{ is aware of } e_i \}.
  • Stage 1: KLLMP1(S,a)Ka(S)K \leftarrow LLM_{\mathrm{P1}}(S, a) \approx \mathcal{K}_a(S).
  • Stage 2: answerLLMP2(K,q)answer \leftarrow LLM_{\mathrm{P2}}(K, q).

Prompt templates operationalize these stages:

  • Stage 1: "The following is a sequence of events: {story}. Which events does {character} know about?"
  • Stage 2: "{filtered episode}. Answer the following question: {ToM query}."

No gradient updates or model retraining are required; prompt-tuning suffices. Evaluation typically uses multiple-choice accuracy on standardized ToM datasets.

3. Algorithmic Blueprint and Variants

A canonical SimToM implementation involves the following subroutines (Sarangi et al., 15 Jan 2025):

Subroutine Functionality
GET_AGENT Parse query to extract target (“outermost”) agent
REPHRASE_Q Reframe question for agent’s perspective
SIMULATE_PERSPECTIVE Prompt: restate only events known to agent
ANSWER_FROM_STORY Prompt: answer the ToM question with agent-view story

Pseudocode:

1
2
3
4
5
agent = GET_AGENT(question)
new_q = REPHRASE_Q(agent, question)
sim_story = SIMULATE_PERSPECTIVE(agent, story)
answer = ANSWER_FROM_STORY(sim_story, new_q, choices)
return answer
This approach is highly modular, requires no additional model training, and generalizes well to first-order and some second-order ToM tasks. It is, however, not inherently recursive: for nested ToM (e.g., "Where does A think B thinks...?"), SimToM only simulates the outermost (A’s) perspective (Sarangi et al., 15 Jan 2025).

SimToM-inspired modules are also incorporated into agent frameworks such as ToM-SWE, where a dedicated ToM agent persistently infers and maintains the user’s latent state across engineering sessions, representing goals (gg), constraints (CC), and preferences (PP) with evolving hierarchical memory structures (Zhou et al., 24 Oct 2025).

4. Evaluation Benchmarks and Quantitative Results

Theory-of-Mind Story Tasks

Standard ToM benchmarks include Sally–Anne style scenarios (ToMI), naturalistic false-belief narratives (BigTOM), and conversationally embedded ToM tasks (FANToM, Hi-ToM). Quantitative comparisons demonstrate that SimToM yields significant accuracy gains over zero-shot and CoT prompting baselines.

Model BigTOM FB Δ vs 0-shot/CoT ToMI FB Δ vs 0-shot/CoT
Llama2-7b-chat 70.5% +23.0 / +39.0 40.0% +11.8 / +16.0
GPT-3.5-Turbo 70.5% +29.5 / +14.2 81.0% +13.8 / +47.0
GPT-4 92.0% +3.0 / –1.2 87.8% +62.2/ +13.5

On Hi-ToM and FANToM, SimToM improves over MC and CoT baselines, but fails to match recursive approaches like Decompose-ToM on higher-order tasks (Sarangi et al., 15 Jan 2025).

Model Hi-ToM 1st 2nd 3rd 4th FANToM-Short Long
MC Baseline 60.0 40.8 22.5 21.7 55.0 44.0
CoT 74.2 53.3 45.0 42.5 74.0 64.7
SimToM 75.0 55.0 48.3 43.3 90.4 84.7
Decompose-ToM 76.7 86.7 87.5 83.3 88.4 86.2

Agentic ToM in Software Engineering

In ToM-SWE (Zhou et al., 24 Oct 2025), SimToM-style user modeling boosts both codebench success rates and user satisfaction:

Agent Ambiguous SWE Stateful SWE User Satisfaction
CodeAct 51.9% 13.5% 2.57
+RAG 56.0% 18.7% 3.09
+ToM (SimToM) 63.4% 57.4% 3.62

These results validate the practical significance of simulated perspective-taking and persistent user modeling.

5. Analyses, Strengths, and Limitations

Strengths:

  • Substantial improvements for first-order and simple second-order ToM tasks.
  • Modular, prompt-based, and training-free, facilitating adoption across LLM backends and domains.
  • Demonstrated value in agentic workflows for software engineering, with persistent user state inference and stateful adaptation (Zhou et al., 24 Oct 2025).

Limitations:

  • No inherent recursion; performance degrades for higher-order nested belief tasks (Sarangi et al., 15 Jan 2025).
  • Filtering/hiding events does not generalize to settings requiring inference of truly unseen or hypothetical states (Wilf et al., 2023).
  • Relies on LLMs’ latent ToM-relevant knowledge, possibly limiting applicability to smaller models or non-instruction-tuned architectures.
  • Evaluations to date focus on structured, stylized benchmarks; generalization to real-world, open-domain, or adversarial ToM remains an open area.

Notable ablation findings:

  • Merging both stages into a single prompt degrades performance to near-CoT levels.
  • Optimal perspective-taking (oracle-labeled context) enables close to perfect accuracy, indicating that context isolation is a key bottleneck for contemporary LLMs.

6. Extensions, Applications, and Recommendations

SimToM’s dual-stage, perspective-driven pattern is being extended across domains:

  • In agent frameworks, SimToM-inspired ToM agents persistently encode individual user goals and preferences as structured memory, facilitating user-aligned interactions (Zhou et al., 24 Oct 2025).
  • In conversational settings, recursive extensions (e.g., Decompose-ToM) decompose complex queries across multiple agent points of view (Sarangi et al., 15 Jan 2025).
  • Practical recommendations emphasize separating perspective-taking from task execution, using hierarchical user profiles, and prompt templates guiding structured extraction of goals, constraints, and preferences.

Recommendations for generalizing SimToM (Zhou et al., 24 Oct 2025):

  • Maintain clear separation between “doer” and “thinker” modules (dual-agent architecture).
  • Structure memory for persistent user modeling (raw session logs → session summaries → global profiles).
  • Balance context retrieval and model cost, leveraging small LLMs for SimToM wherever possible.

7. Future Directions

Research highlights several promising directions:

  • Building and evaluating on lifelike, open-domain ToM benchmarks with richer, less stylized mental-state scenarios.
  • Systematic analysis of LLM failures: hallucination versus inference trade-offs, especially in ambiguous/missing data contexts.
  • Integrating world model updates and symbolic reasoning for higher-order and group Theory-of-Mind.
  • Enhancing SimToM with domain-specific few-shot prompt examples for specialized deployment.
  • Expansion into new agent domains—creative writing, data analysis, education—by adapting and extending the SimToM dual-agent template and memory hierarchy (Zhou et al., 24 Oct 2025).

A plausible implication is that explicit, modular simulation of other agents’ perspectives within LLM architectures will continue to be central to progress on machine Theory-of-Mind, with SimToM methods providing a baseline pattern for future development and adaptation (Wilf et al., 2023, Sarangi et al., 15 Jan 2025, Zhou et al., 24 Oct 2025).

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Simulated-ToM (SimToM).