Misspecified Bayesian Learning

Updated 3 December 2025

The paper investigates how standard Bayesian updating concentrates on a pseudo-true parameter by minimizing KL divergence, leading to miscalibrated credible intervals.
It proposes remedies such as tempering the likelihood using a learning rate (η) and the SafeBayes algorithm to restore calibration and enhance predictive performance.
It reviews modular, restricted, and projection methods that isolate robust components in complex models to improve uncertainty quantification and generalization.

Misspecified Bayesian Learning describes Bayesian inference when the postulated statistical model does not contain the true data-generating process. Standard Bayesian updating, grounded in a specific likelihood and prior, typically concentrates its posterior on the parameter minimizing the Kullback–Leibler (KL) divergence between truth and model, but may exhibit miscalibration and suboptimal generalization performance under misspecification. The resulting pseudo-true parameter often lacks a meaningful interpretation and credible intervals can be badly mis-calibrated, prompting a rigorous examination of concentration, uncertainty quantification, predictive behavior, and remedies for misspecification.

1. Formal Structure and Concentration under Misspecification

Misspecified Bayesian learning begins by observing data $Z^n=(Z_1,\ldots,Z_n)\sim P$ with the aim of fitting a parametric family $\{p_\theta:\,\theta\in\Theta\}$ . Misspecification means $P\notin\{p_\theta\}$ . The key problem is that standard Bayesian inference concentrates on the pseudo-true parameter

$\theta^* = \arg\min_{\theta\in\Theta} D(P\,\Vert\,p_\theta),$

where $D(P\,\Vert\,p_\theta)=E_{Z\sim P}\left[-\log p_\theta(Z)\right] - H(P)$ is the KL divergence and $H(P)$ is the entropy of $P$ (Heide et al., 2019, Nott et al., 2023). In decision-theoretic terms with loss $\ell_\theta(z)=-\log p_\theta(z)$ , $\theta^*$ minimizes risk $R(\theta)=E_P[\ell_\theta(Z)]$ .

The standard posterior,

$\pi_n(\theta|Z^n) \propto \pi_0(\theta) L(\theta;Z^n)$

with $L$ the likelihood, concentrates on $\theta^*$ as $n\to\infty$ . However, uncertainty quantification is unreliable: credible sets' frequentist coverage can differ sharply from nominal levels. Under regularity, credible sets' actual coverage is determined by a "sandwich" covariance formula involving both the expected Hessian and score covariance, which diverge under misspecification (Frazier et al., 2023).

2. Concentration Properties and Remedies

2.1 SafeBayesian and η-Generalized Posteriors

To restore concentration and calibration under misspecification, the likelihood can be tempered: $\pi_n^\eta(\theta|Z^n) \propto \pi_0(\theta) L(\theta;Z^n)^\eta$ for a learning rate $\eta\in(0,1)$ . For generalized linear models (GLMs), there exists a central condition $\bar{\eta}>0$ such that for all $\theta$ , $E_{Z\sim P}[\exp(-\bar{\eta}(\ell_\theta(Z)-\ell_{\theta^*}(Z)))]\leq 1$ (Heide et al., 2019). When this holds, for $0<\eta<\bar{\eta}$ , the $\eta$ -generalized posterior concentrates rapidly around $\theta^*$ in a misspecification-specific metric $d_{\bar{\eta}}$ , with rates $O\left(\frac{d}{n} \log n\right)$ . Excess risk bounds, under exponential-tail assumptions, are $O\left( \frac{d}{n} (\log n)^2 \right)$ .

The "SafeBayes" algorithm selects $\eta$ via an online minimization of the cumulative posterior-randomized log-loss: $S(\eta) = \sum_{i=1}^n E_{\theta\sim\pi_{i-1}^\eta}[-\log p_\theta(Z_i)]$ and recommends $\hat{\eta}=\arg\min_\eta S(\eta)$ . This robustly chooses $\hat\eta<1$ when misspecification is present, restoring calibration and predictive performance (Heide et al., 2019, Grünwald et al., 2014).

2.2 Score-Based Approximations: The Q-Posterior

Safe uncertainty quantification is achievable using the Q-posterior—a quadratic form in the score, leveraging its empirical covariance matrix $W_n(\theta)$ : $\pi_n^Q(\theta) \propto \pi(\theta) |W_n(\theta)|^{-1/2} \exp\left\{ - \frac{1}{2} [m_n(\theta)/\sqrt{n}]^\top W_n(\theta)^{-1} [m_n(\theta)/\sqrt{n}] \right\}$ where $m_n(\theta) = -\nabla_\theta \ell_n(\theta)$ . This yields credible sets with correct frequentist coverage irrespective of misspecification and applies directly to latent-variable and generalized-loss posteriors (Frazier et al., 2023, Nott et al., 2023).

3. Modular, Restricted, and Projection Methods for Robust Inference

Misspecification often only affects particular parts (modules) of complex models. Modular inference ("cutting feedback") restricts posterior updating in the affected section—preventing contaminated feedback from propagating. Restricted likelihood methods, such as Bayesian restricted likelihood (BRL), base inference only on a data summary $S=s(y)$ sufficiently robust under misspecification. Projected inference methods fit a nonparametric (or highly flexible) reference model $F$ and project posterior draws onto a simplified model $S$ via KL-projection: $\theta_S^\perp = \arg\max_{\theta_S} E_{F}[\log p(y|\theta_S,S)]$ These methods redefine inference so that it targets interpretable parameters or predictive quantities robust to misspecification (Nott et al., 2023, Li, 2023, Smith et al., 2023).

Table: Modular and Restricted Bayesian Remedies

Method	Summary Statistic / Module	Posterior Target
BRL	Robust summary $s(y)$	$\pi_r(\theta\|s)$
Cut (modular)	Submodel $\phi$	$\pi_{\text{cut}}(\phi\|X)$ , $\pi(\zeta\|Y,\phi)$
KL-projection	Flexible $F$ , target $S$	$\{\theta_S^\perp(F)\}$

4. Predictive Performance and Generalization under Misspecification

Classical PAC-Bayes and Bayesian model averaging provide suboptimal generalization bounds when models are misspecified. Second-order PAC-Bayes bounds incorporate a variance (diversity) correction term, leading to new algorithms (PAC²-Variational, PAC²-Ensemble) that optimize predictive cross-entropy directly: $CE(\rho) = E_{x\sim\nu}[-\ln p_{\text{pred}}(x)]$ where $p_{\text{pred}}(x)=E_{\theta\sim\rho}[p(x|\theta)]$ . These Bayesian-like but non-Bayesian posterior constructions consistently outperform standard Bayesian predictions in empirical and simulated settings, especially under heavy misspecification (Masegosa, 2019).

5. Misspecified Bayesianism: Observational Equivalence and Rationalizability

A sequence of beliefs is consistent with "misspecified Bayesianism" if the prior contains a "grain" (mixture) of the average posterior, formalized as a partition-based grain condition. With full-support priors over finite or compact spaces, any observed law of posteriors is MB-rationalizable, but misspecified Bayesianism imposes tail limitations on unbounded spaces—precluding heavy-tailed posteriors from light-tailed priors. The upshot is that many heuristic or apparently non-Bayesian updating schemes are observationally indistinguishable from Bayesian updating under implicit misspecification (Molavi, 30 Jul 2025).

6. Learning Dynamics, Equilibrium, and Economic Implications

Learning dynamics with misspecified models converge to generalized equilibria balancing optimal actions against best-fitting subjective beliefs (KL-minimizers), formalized via Berk–Nash equilibrium (Esponda et al., 2019, Li et al., 2024, Ghosh, 2024). In principal-agent contracts or social learning environments, misspecification can generate persistent biases, cycling, or sharp welfare losses, even when approximate rationality is maintained. Computational complexity results show that even weak misspecification can render equilibrium computation prohibitively hard for large action spaces (Li et al., 2024). In bandit and meta-learning contexts, prior misspecification degrades performance gracefully (at most $O(H^2\epsilon)$ in horizon and TV-distance), but learning the prior dynamically across tasks recovers oracle performance (Simchowitz et al., 2021, Peleg et al., 2021).

7. Practical Guidelines and Implementation

Misspecification detection and correction in practice involves monitoring diagnostic quantities (e.g., mixability gaps or squared error risk), implementing SafeBayes or Q-posterior samplers, and experimenting with tempered posteriors ( $\eta<1$ ). In deep learning contexts, replacing Gaussian assumptions with heavy-tailed likelihoods or meta-learned priors yields immediate empirical performance gains, often obviating the need for cold (over-sharpened) posteriors (Vaart et al., 29 Aug 2025). Modular, restricted, and projection approaches should be considered in complex models, especially to isolate trusted modules or robust aspects of the data (Nott et al., 2023).

References

Safe-Bayesian regression and the central condition (Heide et al., 2019), linear model misspecification and SafeBayes (Grünwald et al., 2014)
Modular, restricted, and projection remedies (Nott et al., 2023, Li, 2023, Smith et al., 2023)
Calibration of uncertainty via score-based Q-posterior (Frazier et al., 2023)
PAC-Bayes and generalization under misspecification (Masegosa, 2019)
Equilibrium and learning dynamics (Esponda et al., 2019, Li et al., 2024, Ghosh, 2024)
Misspecified Bayesianism and observational equivalence (Molavi, 30 Jul 2025)
Bandit and meta-learning robustness (Simchowitz et al., 2021, Peleg et al., 2021)
Deep Q-learning misspecification and remedies (Vaart et al., 29 Aug 2025)