Papers

Topics

Authors

Recent

View all

Assistant

AI Research Assistant

Well-researched responses based on relevant abstracts and paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses.

Gemini 2.5 Flash

Gemini 2.5 Flash 164 tok/s

Gemini 2.5 Pro 46 tok/s Pro

GPT-5 Medium 21 tok/s Pro

GPT-5 High 27 tok/s Pro

GPT-4o 72 tok/s Pro

Kimi K2 204 tok/s Pro

GPT OSS 120B 450 tok/s Pro

Claude Sonnet 4.5 34 tok/s Pro

2000 character limit reached

Limit-Computable Grains of Truth for Arbitrary Computable Extensive-Form (Un)Known Games (2508.16245v1)

Published 22 Aug 2025 in cs.GT, cs.LG, cs.MA, and econ.TH

Abstract: A Bayesian player acting in an infinite multi-player game learns to predict the other players' strategies if his prior assigns positive probability to their play (or contains a grain of truth). Kalai and Lehrer's classic grain of truth problem is to find a reasonably large class of strategies that contains the Bayes-optimal policies with respect to this class, allowing mutually-consistent beliefs about strategy choice that obey the rules of Bayesian inference. Only small classes are known to have a grain of truth and the literature contains several related impossibility results. In this paper we present a formal and general solution to the full grain of truth problem: we construct a class of strategies wide enough to contain all computable strategies as well as Bayes-optimal strategies for every reasonable prior over the class. When the "environment" is a known repeated stage game, we show convergence in the sense of [KL93a] and [KL93b]. When the environment is unknown, agents using Thompson sampling converge to play $\varepsilon$-Nash equilibria in arbitrary unknown computable multi-agent environments. Finally, we include an application to self-predictive policies that avoid planning. While these results use computability theory only as a conceptual tool to solve a classic game theory problem, we show that our solution can naturally be computationally approximated arbitrarily closely.

Summary

The paper presents a formal solution to the grain of truth problem by constructing reflective-oracle computable strategies that include all computable and Bayes-optimal policies.
It leverages reflective oracles to overcome recursive reasoning challenges, ensuring convergence to ε-Nash equilibria in both known and unknown computable games.
The approach is practical with effective enumerability and limit-computability, paving the way for self-predictive agents and advanced multi-agent learning frameworks.

Limit-Computable Grains of Truth for Arbitrary Computable Extensive-Form (Un)Known Games

Introduction and Problem Formulation

This paper addresses the grain of truth problem in Bayesian multi-agent learning within arbitrary computable extensive-form games. The grain of truth problem, originally posed by Kalai and Lehrer, asks whether it is possible to construct a sufficiently rich class of strategies such that every Bayes-optimal policy (with respect to this class) is itself contained within the class, allowing for mutually consistent beliefs among Bayesian agents. Previous work established only limited classes with this property, and several impossibility results suggested that the problem is intractable for general strategy classes.

The authors present a formal solution by constructing a class of reflective-oracle computable strategies ( $Prefl$ ) that includes all computable strategies and Bayes-optimal strategies for any reasonable prior. This construction leverages reflective oracles to resolve the infinite regress of agents reasoning about each other's reasoning, enabling consistent Bayesian inference in multi-agent settings.

Reflective Oracles and Computability Foundations

Reflective oracles are central to the solution. They allow probabilistic Turing machines (pTMs) to query about their own behavior, circumventing diagonalization barriers and enabling self-referential reasoning. The paper extends the definition of reflective oracles to non-binary alphabets and introduces typed oracles to handle distinct action and percept spaces.

Key computability notions are formalized:

Limit-computable functions: Approximable to arbitrary but unknown precision.
Lower semicomputable (l.s.c.) functions: Approximable from below.
Estimable functions: Approximable to arbitrary pre-specified precision.

The authors prove that for any pTM, the induced semimeasure is l.s.c., and conversely, any l.s.c. semimeasure can be sampled by a pTM. With reflective oracle access, these results generalize to $O$ -sampled and $O$ -estimable semimeasures, forming the basis for the strategy class $Prefl$ .

Multi-Agent Game Model and Strategy Class Construction

The paper formalizes multi-agent games as functions mapping histories and action profiles to distributions over percepts. Each agent's strategy is a mapping from its local history to a distribution over actions. The subjective environment for each agent is defined by marginalizing over the other agents' actions and percepts.

The strategy class $Prefl$ consists of all reflective-oracle computable strategies. The authors show that $Prefl$ is effectively enumerable and contains a dominant mixture policy $\zeta$ that multiplicatively dominates all other strategies in the class. This dominance property is crucial for establishing the grain of truth property.

Existence of Reflective-Oracle Computable Nash Equilibria

The authors construct Nash equilibria in the class of reflective-oracle computable strategies. For any computable multi-agent game, they show that mutually optimal response strategies exist and are reflective-oracle computable. The construction uses Kleene's second recursion theorem to resolve the circular dependencies among agents' strategies.

The value function for each agent is defined as the expected sum of discounted rewards, and optimal strategies are constructed via reflective-oracle guided maximization. The Nash equilibrium obtained is subgame perfect, as agents act optimally even on histories they play with zero probability.

Convergence of Bayesian Agents and Grain of Truth Property

The paper proves that Bayesian agents with priors supported on $Prefl$ converge to $\varepsilon$ -Nash equilibria in infinitely repeated computable games. The construction of the dominant mixture policy $\zeta$ ensures that every Bayes-optimal strategy is assigned nonzero probability, satisfying the grain of truth property.

For unknown games, the authors extend the analysis to Thompson sampling strategies, showing that agents using limit-computable Thompson sampling policies converge to $\varepsilon$ -Nash equilibria in arbitrary unknown computable multi-agent environments. The strong grain of truth property is established for the class $Prefl$ and the corresponding environment class $Mrefl$ .

Impossibility Results and Avoidance

The paper discusses classical impossibility results (Nachbar, Foster & Young) and demonstrates that the constructed class $Prefl$ avoids these by violating the purity condition—no deterministic policy in $Prefl$ can always take an action that a stochastic policy assigns positive probability to. The countability and computability constraints on $Prefl$ prevent the pathologies that lead to impossibility in uncountable or unrestricted strategy classes.

Asymptotic Optimality in Unknown Games

The authors generalize the convergence results to settings where agents are not initially aware of the game or the existence of other agents. By considering the class of reflective-oracle computable environments ( $Mrefl$ ), they show that asymptotically optimal policies (in mean) converge to $\varepsilon$ -Nash equilibria. Thompson sampling is shown to be reflective-oracle computable under estimable priors, and the limit-computability of reflective oracles ensures practical approximability.

Application to Self-Predictive Agents

A novel application is presented: the construction of self-predictive agents (Self-AIXI) that maintain consistent beliefs about their own future policy. The machinery developed for the grain of truth problem enables the definition of a stochastic self-predictive policy within $Prefl$ , providing a principled alternative to planning-based RL agents.

Implementation Considerations

Enumerability: Both $Prefl$ and $Mrefl$ are effectively enumerable, allowing for practical implementation of Bayesian mixtures and Thompson sampling.
Limit-computability: All key constructions (reflective oracles, mixture policies, optimal strategies) are limit-computable, enabling arbitrary precision approximation.
Typed Oracles: The extension to non-binary and typed reflective oracles supports heterogeneous action and percept spaces, facilitating deployment in complex multi-agent systems.
Resource Requirements: The limit-computable algorithms are feasible for implementation on real-world hardware, subject to computational resource constraints inherent in Turing machine simulation and oracle approximation.

Implications and Future Directions

The results provide a rigorous foundation for Bayesian learning in general multi-agent environments, justifying the emergence of Nash equilibria from rational learning dynamics. The framework supports both known and unknown games, and the use of reflective oracles resolves longstanding issues in recursive reasoning and self-prediction.

Future research directions include:

Characterizing the centrality and uniqueness of reflective oracles among solutions to the grain of truth problem.
Investigating the intersection and union of $Prefl$ across different reflective oracles.
Exploring the practical deployment of reflective-oracle based agents in real-world multi-agent systems, including human-computer interaction and cooperative AI architectures.

Conclusion

The paper provides a comprehensive solution to the grain of truth problem for arbitrary computable extensive-form games, constructing a limit-computable class of strategies and environments that supports consistent Bayesian learning and convergence to Nash equilibria. The use of reflective oracles enables principled recursive reasoning and self-prediction, with broad implications for the theory and practice of multi-agent reinforcement learning and game theory.