Learning and Equilibrium under Model Misspecification

Published 14 Jan 2026 in econ.TH and math.ST | (2601.09891v1)

Abstract: This chapter develops a unified framework for studying misspecified learning situations in which agents optimize and update beliefs within an incorrect model of their environment. We review the statistical foundations of learning from misspecified models and extend these insights to environments with endogenous, action-dependent data, including both single agent and strategic settings.

Abstract PDF Upgrade to Chat

Authors (2)

Summary

The paper establishes a unified framework linking Bayesian updating with equilibrium analysis, demonstrating that posteriors concentrate on pseudo-true parameters under misspecification.
It introduces the Berk–Nash equilibrium, broadening traditional equilibrium concepts to account for endogenous data and model errors in both single-agent and strategic settings.
Utilizing stochastic approximations and martingale techniques, the study delineates conditions for the convergence of actions, beliefs, and empirical frequencies despite model misspecification.

Learning and Equilibrium under Model Misspecification: A Technical Overview

Introduction and Context

This work establishes a unified analytical framework for the study of dynamic belief formation, Bayesian updating, and behavioral outcomes when agents operate under an incorrect model of the environment. It systematically synthesizes and extends the economic, statistical, and game-theoretic literature on model misspecification, providing formal characterizations of long-run beliefs and behavior in both single-agent and multi-agent environments. The framework accommodates a range of misspecification origins, including behavioral biases, bounded rationality, and deliberate model reduction.

Five strands in the historical development of this literature are highlighted: (i) early connections between statistical learning and equilibrium; (ii) self-confirming equilibrium and adaptive learning; (iii) static equilibrium notions with exogenous misspecification; (iv) the unification with statistical misspecified learning and development of endogenously determined belief-action fixed points; and (v) a nascent literature considering model selection and recognition of misspecification.

Posterior Concentration under Misspecification

A substantial technical contribution is the clear mapping from classical Bayesian consistency results, especially those due to Schwartz and Berk, to environments permitting endogenous, action-dependent data. The paper rigorously shows that, even when the agent's model class is misspecified, the Bayesian posterior concentrates (exponentially fast) around the set of parameters within the model that minimize the Kullback–Leibler (KL) divergence to the true data-generating process. However, the asymptotic outcome is not necessarily the truth, but a “pseudo-true” parameter—the KL projection. The analysis distinguishes between three regimes: perfect specification (posterior consistency at the truth), misspecification with a unique KL projection (posterior consistency at the pseudo-true parameter), and misspecification with non-uniqueness (oscillatory or non-convergent posteriors).

The technical apparatus is built on regularity conditions: compactness, domination, a.e. continuity, and full support. Exponential concentration bounds for the posterior are derived, generalizing and formalizing Berk's results and integrating subsequent developments for nonparametric and dependent-data settings.

Endogenous Learning: Single-Agent and Strategic Environments

Dynamic Framework and Berk–Nash Equilibrium

In recurrent decision environments, where agents’ choices shape future observations, beliefs evolve in a feedback loop. The paper defines the Berk–Nash equilibrium: a steady-state in which actions are optimal given endogenously generated beliefs, and those beliefs are supported on the KL minimizers (pseudo-true parameters) consistent with the empirical action distribution.

This fixed-point equilibrium generalizes Nash and self-confirming equilibrium to encompass both misspecification of models and the endogeneity of data under strategic interaction. In repeated contexts, under standard conditions, convergence of actions or empirical action distributions entails convergence to Berk–Nash equilibrium (standard or generalized, depending on identification conditions).

Single-Agent Examples

Canonical cases include effort choice with overconfidence (systematic bias in perceived ability), trade with adverse selection under correlation neglect, and monopoly pricing with an incorrect demand model. In each, the equilibrium is characterized by a system in which behaviorally optimal actions reinforce beliefs that are not necessarily correct but are stable under the Bayesian learning/quasi-experimentation induced by the agent’s misspecified model.

This analysis is extended to noisy (random utility) environments, with “intended” strategy convergence interpreted through payoff perturbations, and to forward-looking agents, showing that with weak identification, dynamic and myopic Berk–Nash coincide at steady state.

Strategic Settings

The framework generalizes to games, where each agent's perceived model may be misspecified, and beliefs must jointly satisfy within-player KL-minimization (given the empirical distribution of observed play/actions) and across-player best-response optimality. The analysis shows structural robustness of the equilibrium characterization: under endogenous data, repeated play by (possibly biased) agents leads to steady states where every player’s strategy optimally responds to their own misspecified but endogenously updated model, constrained by out-of-equilibrium inferences, identification, and the richness of the subjective model class.

Convergence Analyses

Three complementary approaches, each supported by specific mathematical tools, are used to study the asymptotic behavior:

Convergence of Actions: Under finite actions/consequences, if actions converge, the limit is a (uniform) Berk–Nash equilibrium. Uniformly strict Berk–Nash equilibrium is necessary and sufficient for uniform stability, even with forward-looking agents and subexponentially thin priors. The notion of “positive attractiveness" is formally defined under subjectively exogenous outcomes.
Convergence of Empirical Action Frequencies: The limiting empirical frequency converges to a (generalized) Berk–Nash equilibrium. Stochastic-approximation techniques are used to link the discrete process to a differential inclusion governing long-run frequencies. Global attraction in the associated dynamical system yields almost sure convergence.
Convergence of Beliefs: Employing modern martingale techniques and introducing the concept of $q$ -dominance (a local order on KL divergence), it is shown that, under certain continuity and uniqueness conditions, beliefs (posteriors) converge to point-mass degenerate beliefs at KL minimizers when those are locally stable. Conversely, non-Berk–Nash supporting beliefs are unstable. Iterated elimination of KL-dominated beliefs is shown to capture the globally stable set.

These techniques are powerful but have precise domains of applicability—outlined in detail—regarding identification, action space, the nature of the agent's policy mapping from beliefs to actions, and the technical structure of $\Theta$ .

Implications and Future Directions

This chapter rigorously demonstrates that, in a wide class of economic (and potentially machine learning) settings, agents who repeatedly optimize and update under a misspecified model will, in the long run, behave as if the environment is the best KL approximation to reality within their model class. Asymptotic learning leads not to truth, but to pseudo-truths determined by the constraints of the agent's subjective class. This framework immediately suggests several theoretical and practical implications:

Prediction and Policy: Structural bias in belief formation, especially under endogenously generated data feedback, can systematically and robustly prevent correct learning. This provides an analytical basis for longstanding challenges in policy—behavioral biases, selection neglect in markets, and failures of experimentation/exploration.
Limits to Identification: The analysis clarifies the minimal requirements for identification of parameters in steady state, including necessary conditions for belief and frequency convergence.
Misspecification as a Choice Variable: The closing discussion notes the importance, for both normative and positive theory, of endogenizing the process by which agents select their model class, recognize possible misspecifications, or incorporate meta-beliefs about model adequacy.
Extensions and Open Problems: Robustness to time-varying or evolving model spaces, implications for more complex dynamic environments (non-Markovian data, rare signals, infinite-dimensional parameters), and extensions to networks and social inference are identified as active and important areas.

Conclusion

The paper provides a definitive, technical synthesis of the theory of learning and equilibrium under model misspecification in economics, linking statistical theory, behavioral assumptions, and game theory. The analytical framework and results lay the foundation for a precise taxonomy of steady-state (and transient) outcomes in belief-based learning under broad forms of model error, and equip researchers with tools for comparative statics, stability analysis, and the design of environments and policies robust to persistent belief distortions. The theoretical apparatus is also directly relevant for applications in learning-in-games, market design, and the study of inductive biases in social and economic inference. This is essential, not only for understanding limitations of asymptotic rationality, but also for designing interventions and robust learning systems when specification error is inevitable.

Reference: "Learning and Equilibrium under Model Misspecification" (2601.09891)

Markdown Report Issue