Mind-like Inference in Sequential Games

Updated 29 December 2025

The paper presents a unified framework for modeling latent mental states and sequential decision-making via Bayesian inversion and bounded rationality.
It details methodologies including feature-driven decision rules, inverse Bayesian inference, and recursive belief updates to predict agents' strategies.
The study illustrates applications in AI, multi-agent games, and neuro-inspired control architectures with interpretable evaluation metrics.

Mind-like inference in sequential games denotes the computational principles and mechanisms by which agents attribute, infer, or exploit latent mental states—goals, beliefs, intentions, or bounded-rational biases—of other agents or themselves in extensive-form games where decisions unfold over multiple stages. Spanning cognitive modeling, AI system design, and formal game theory, mind-like inference synthesizes elements from Theory of Mind (ToM), bounded rationality, forward induction, abduction, and control-theoretic architectures to explain and operationalize anticipatory, belief-driven or “mentalizing” reasoning in interactive environments.

1. Cognitive and Computational Foundations

Mind-like inference develops from the need to model, induce, and predict agents’ internal motivational states and strategic reasoning in dynamic social or competitive settings. Canonical ToM abilities encompass inferring the unobservable preferences, intentions, and beliefs of other agents based on observation of their sequential actions. As formalized in player modeling and behavioral inference frameworks, these latent aspects are instantiated as parameters in feature-driven decision rules governing observable behavior. Cognitive models of sequential decision-making frequently posit that decision-makers select actions based on a weighted combination of observable game features, subject to stochastic choice rules:

$U_X(x_o;\mathcal G_n,\theta) = \sum_{r=1}^R w_r [g_{r,x_o}(\mathcal G_n)-\delta_r]$

$p(X_n=x_{n,o}\mid\mathcal G_n,\theta) = \frac{\exp(U_X(x_{n,o}))}{\sum_{o'}\exp(U_X(x_{n,o'}))}$

Parameters $\theta$ encapsulate latent behavior tendencies (e.g., feature weights, thresholds), whose inference from action sequences constitutes a form of mind-like model inversion. For instance, in sequential move-choice environments such as BoomTown, inverse Bayesian inference over these parameters (with priors and likelihoods defined by the cognitive choice dynamics) directly operationalizes ToM (Shergadwala et al., 2021).

2. Bounded Rationality and Psychological Distortions

Real-world agents rarely conform to perfect rationality: psychological evidence (e.g., Anchoring Theory, Quantal Response) indicates systematic deviations. Mind-like inference thus requires explicit modeling of such deviations as operator transformations on standard best-response calculations. In sequential Stackelberg games, the Anchoring Theory in Sequential Stackelberg Games (ATSG) model formally distorts the perceived action probabilities of a leader by a follower towards the uniform distribution:

$p'_a = (1-\alpha) p_a + \frac{\alpha}{|A|}$

The leader, anticipating this flattening, solves an optimization where the follower’s best response is computed to the distorted strategy. Both exact (MILP) and heuristic (Monte Carlo, evolutionary) solution methods adjust only the follower’s payoff oracle, yielding a modular framework for embedding bounded-rationality in extensive-form inference (Karwowski et al., 2019). Extensions encompass quantal response, prospect-theory weights, and more, each yielding tractable mind-like distortions in either linear or piecewise-linear sequence form.

3. Hierarchical and Recursive Models of Other Minds

Theory-of-Mind capable agents may reason recursively about others’ mental states. In a computable game-theoretic framework for multi-agent ToM, each agent maintains probabilistic beliefs about the “level” of strategic sophistication of opponents (cognitive hierarchy), parameterized via a Poisson distribution with conjugate Gamma updating:

$f(k;\lambda) = e^{-\lambda} \frac{\lambda^k}{k!},\quad \Lambda\sim \mathrm{Gamma}(a,b)$

Policy construction is recursive: the level- $k+1$ policy is a best response to a mixture or singleton profile over lower levels, solved either exactly or via the QMDP approximation in MDPs. Each round, observed play updates the posterior on λ, reconstructs the support hierarchy, and computes an optimal (boundedly rational) response. All key steps—belief update, recursion, best-response—are computable in polynomial time per round for bounded K (Zhu et al., 27 Nov 2025).

4. Predictive Control and Feedback Architectures

Neural-inspired and cybernetic approaches capture mind-like inference as layered adaptive control systems. Agents utilize a hierarchy:

Reactive Layer: Fast, sensory-driven responses (e.g., Braitenberg vehicles for approach/avoid).
Adaptive Layer: Slow learning and belief updates about both the environment and the other agent’s likely policies.

Each round, the agent predicts the opponent’s action, inhibits its own reflexes accordingly, and updates beliefs based on observed mismatch (prediction error signals). Separate TD learners are maintained for “what the other will do” (predictor) and “what action should I take” (actor-critic), with prediction and action policies updated according to intrinsic and extrinsic rewards, respectively. This produces both superior game-theoretic performance and interpretable, real-time belief adaptation (Freire et al., 2019).

5. Sequential Rationality, Forward Induction, and Conditional Dominance

Mind-like inference in extensive-form games extends to reasoning about the “rationality” and intentions of other players via forward induction:

Conditional B-Dominance quantifies rational eliminations of dominated strategies at each information set, conditioning only on strategies compatible with reaching that set.
Iterative Conditional B-Dominance (ICBD) applies successive elimination of such dominated strategies, producing a decreasing sequence of strategy sets and ultimately converging (under genericity) to the unique backward induction outcome (Guarino, 2023).

Formally, at each information set h, a strategy is eliminated if weakly dominated by another given the set of feasible opponent continuations, with the process iterated until convergence. This process instantiates “mind-like” forward reasoning: players update beliefs about feasible continuation paths in response to each other’s past rationality.

6. Abductive Reasoning and ToM in Imperfect Information

In cooperative, partially observed domains, agents combine abductive reasoning with ToM to infer others’ knowledge and intentions. Each agent encodes its perception and domain knowledge as a logic program augmented by abducibles—possible explanations for gaps in its own knowledge. Observed actions are interpreted by inferring hypotheses (through abductive proof) that make the action optimal under the agent’s mental model. These hypotheses are then pruned and internalized as epistemic constraints, influencing future action selection and hypothesis formation. Such domain-independent architectures have demonstrated application to cooperative games such as Hanabi, where agents interpret partners’ actions as signals about hidden state (Montes et al., 2022).

7. Evaluation of Mind-like Inference in Machine Agents

Recent work evaluates LLMs as observers in sequential games (e.g., Rock-Paper-Scissors) to diagnose mind-like inference capabilities. LLMs are tasked with identifying latent (possibly adaptive or reactive) policies of agents, generating predictive distributions over outcomes, and providing chain-of-thought justifications. Evaluation metrics such as Union Loss (averaging normalized cross-entropy, Brier score, and expected-value discrepancy) and Strategy Identification Rate (SIR) expose both the fine-grained belief-updating behavior of models and their ability to recognize strategic patterns. While leading models, such as GPT-o3, demonstrate convergence to low loss and high SIR in static and some dynamic regimes, all current models struggle with deep reactive policies and fail to robustly identify human-inspired sequential patterns. This operationalizes a functional, interpretable benchmark for machine-based mind-like inference (Wang et al., 22 Dec 2025).

The study of mind-like inference in sequential games thus interweaves formal cognitive modeling, bounded rationality, belief-updating processes, algorithmic solution concepts, and empirical validation. Techniques range from Bayesian inversion of latent preferences to bounded-rational best-response computation and logic-based abductive inference, providing rigorous, tractable, and interpretable mechanisms for embedding Theory-of-Mind and belief-driven reasoning in both human and artificial agents.