Zero Belief History (ZBH)
- Zero Belief History (ZBH) is defined as a belief state derived exclusively from the current context, omitting any reliance on past observations.
- ZBH is applied in machine Theory of Mind benchmarks, such as the 'Pick the Right Stuff' task, to assess immediate perspective-taking in LLMs.
- In dynamic economic models, ZBH highlights scenarios where agents’ zero-probability events drive subjective asset price bubbles due to belief heterogeneity.
Zero Belief History (ZBH) denotes settings in which an agent’s current belief state is conditioned solely on present context, with no access to or requirement for prior observation history. ZBH is foundational in theoretical models spanning machine Theory of Mind (ToM) and economic equilibrium under belief heterogeneity. In the context of LLMs, ZBH tasks operationalize the minimal challenge for perspective-taking: inferring what another entity believes “right now,” without reference to any past sequence of observations or events. In financial equilibrium theory, ZBH identifies the set of scenarios an agent considers impossible under their subjective probability measure, directly generating subjective asset price bubbles when agents disagree on zero-probability events.
1. Formal Definition and Mathematical Foundations
ZBH is defined over environments with discrete time steps . At each time , the world is described by state , context (all perceptual or observational data available at ), and an agent with belief , which is a probability distribution or proposition set over possible world states. For LLMs, is the model’s internal “belief” state.
In full generality, agent ’s belief depends on the entire observation history :
0
where 1 is the observation at time 2.
ZBH imposes the constraint:
3
that is, 4 is computable solely from 5; 6 is independent of all prior contexts 7.
A query in ZBH asks, for proposition 8: “Does agent 9 believe 0 at 1?”—with the answer 2 computed from 3 alone. There is no dependence on the “belief gap” created by unobserved or recalled prior observations (Tang et al., 2024).
2. Relation to Finite and Infinite Belief Histories
ZBH contrasts sharply with two broader classes:
| Setting | Data Used for Belief Inference | Formal Belief Function |
|---|---|---|
| ZBH | Only current context 4 | 5 |
| Finite Belief History (FBH) | Bounded 6-length prefix 7 | 8 |
| Infinite Belief History (IBH) | Entire history 9 or unbounded implicit structure | 0 |
In FBH, the agent’s belief is chained through a fixed window of 1 previous observations. In IBH, the agent integrates information over unbounded or procedural pasts. ZBH is uniquely characterized by the absence of any explicit history term in the updating mechanism (Tang et al., 2024).
3. ZBH in Machine Theory of Mind: The "Pick the Right Stuff" Benchmark
Tang & Belle (Tang et al., 2024) instantiate ZBH using a multi-round text-based task—“Pick the Right Stuff”—where the LLM acts as a warehouse manager tracking users’ beliefs about object locations. Each round contains:
- Initial setup: N users each place items in numbered slots (Room 1: Opaque Locker); Room 2 (Monitoring Room) contains a monitor reflecting the current locker layout.
- Randomized shuffling: The locker malfunctions and items’ slots are randomly reassigned; the monitor updates live.
- Partial observation: Users may or may not re-enter Room 2 to update their beliefs.
- Belief query: When a user returns to retrieve an item, the LLM is queried: which slot does the user believe contains their item?
- Scoring: 1 point is assigned per correct prediction; rounds continue until all items are retrieved.
In pure ZBH trials, the answer is computable from knowledge of the current locker state and the list of users who have or haven’t observed the monitor since the last shuffle. No prior observations or composite event tracking is required. Example: if user 1 never reentered after two shuffles, the LLM predicts that user 1 still believes in the last configuration seen (Tang et al., 2024).
4. Evaluation Methods and Empirical Results
Six LLMs, spanning parameter sizes from 7B to 72B, were evaluated on the ZBH benchmark: gpt-3.5-turbo (26B params), llama3:70b-instruct (70B), qwen:72b-chat (72B), gemma:7b-instruct (7B), mistral:7b-instruct (7B), and qwen:7b-chat (7B). Each model participated in 60-turn runs, with 5 users per game. The principal metric was average points per turn.
Results:
| Model | ZBH Score | FBH Score |
|---|---|---|
| gemma:7b-instruct | 43.00 | 34.33 |
| mistral:7b-instruct | 40.00 | 30.67 |
| qwen:7b-chat | 34.33 | 25.33 |
| llama3:70b-instruct | 31.00 | 28.33 |
| gpt-3.5-turbo | 30.67 | 25.33 |
| qwen:72b-chat | 28.33 | 25.33 |
Conclusions:
- All models scored higher on ZBH than FBH (mean gap ≈5.73), corroborating that ZBH tasks are computationally and cognitively simpler for current LLMs.
- Several 7B-parameter models (gemma and mistral) outperformed the 70B–72B models, indicating that inductive biases and pretraining regimen—not merely scale—affect ToM-related capabilities (Tang et al., 2024).
5. ZBH in Dynamic Economic Models: Subjective Bubbles
The economic interpretation of Zero-Belief History is developed in Larsson’s model of dynamic equilibrium with belief heterogeneity (Larsson, 2013). Here, for agent 3, 4 is a subjective probability measure (absolutely continuous with respect to the reference measure 5), with density process:
6
Define the zero-belief time (“bankruptcy time”):
7
Any path 8 with 9 is a zero-belief history for agent 0.
In this model, asset prices 1 decompose into fundamental and bubble components:
2
where 3 is agent 4’s subjective fundamental value, and the bubble component is
5
Zero-belief histories—scenarios post-6 that agent 7 excludes as impossible—drive the subjective bubble: the market price 8 incorporates cash flows on histories agent 9 ignores in their computation of 0. As a result, agents with different zero-belief times (null sets) compute different bubbles (Larsson, 2013).
6. Significance, Insights, and Implications
In LLM-based ToM research, ZBH delineates the boundary between immediate, context-based inference and more complex, temporally extended perspective-taking. High LLM scores on ZBH indicate that pretraining confers the ability to simulate first-order false beliefs (“I last observed X”) using only present data.
The observable drop in performance from ZBH to FBH tasks demonstrates concrete limitations: LLMs face challenges chaining over multiple prior beliefs or integrating several observation slices. This highlights a performance bottleneck in multi-step reasoning that is not explained by parameter count alone—smaller models sometimes outperform larger ones.
ZBH provides a tractable and analytically clean class of benchmarks for comparative evaluation of ToM capabilities. Extensions to FBH and IBH allow for systematic stress-testing of historical reasoning, perspective memory, and procedural belief computation. The paradigm underpins both AI safety (robust perspective-tracking in real-world systems) and cognitive modeling (delineating the structure of belief inference in agents).
In dynamic financial equilibrium, ZBH identifies the event sets (null sets) on which subjective probability measures diverge. Disagreement about these histories is sufficient, without trading or portfolio restrictions, to generate subjective equilibrium asset price bubbles. Consequently, ZBH formalizes the informational “blind spots” that yield persistent, agent-dependent deviations between fundamental and observed market values.
Future work includes diversification of ZBH environments to encompass social rules, mathematical proof structures, and implicit procedural chains, with the aim of mapping the limits and strengths of both artificial and human theory of mind reasoning (Tang et al., 2024, Larsson, 2013).