Papers
Topics
Authors
Recent
2000 character limit reached

Misspecified Bayesian Learning

Updated 3 December 2025
  • The paper investigates how standard Bayesian updating concentrates on a pseudo-true parameter by minimizing KL divergence, leading to miscalibrated credible intervals.
  • It proposes remedies such as tempering the likelihood using a learning rate (η) and the SafeBayes algorithm to restore calibration and enhance predictive performance.
  • It reviews modular, restricted, and projection methods that isolate robust components in complex models to improve uncertainty quantification and generalization.

Misspecified Bayesian Learning describes Bayesian inference when the postulated statistical model does not contain the true data-generating process. Standard Bayesian updating, grounded in a specific likelihood and prior, typically concentrates its posterior on the parameter minimizing the Kullback–Leibler (KL) divergence between truth and model, but may exhibit miscalibration and suboptimal generalization performance under misspecification. The resulting pseudo-true parameter often lacks a meaningful interpretation and credible intervals can be badly mis-calibrated, prompting a rigorous examination of concentration, uncertainty quantification, predictive behavior, and remedies for misspecification.

1. Formal Structure and Concentration under Misspecification

Misspecified Bayesian learning begins by observing data Zn=(Z1,,Zn)PZ^n=(Z_1,\ldots,Z_n)\sim P with the aim of fitting a parametric family {pθ:θΘ}\{p_\theta:\,\theta\in\Theta\}. Misspecification means P{pθ}P\notin\{p_\theta\}. The key problem is that standard Bayesian inference concentrates on the pseudo-true parameter

θ=argminθΘD(Ppθ),\theta^* = \arg\min_{\theta\in\Theta} D(P\,\Vert\,p_\theta),

where D(Ppθ)=EZP[logpθ(Z)]H(P)D(P\,\Vert\,p_\theta)=E_{Z\sim P}\left[-\log p_\theta(Z)\right] - H(P) is the KL divergence and H(P)H(P) is the entropy of PP (Heide et al., 2019, Nott et al., 2023). In decision-theoretic terms with loss θ(z)=logpθ(z)\ell_\theta(z)=-\log p_\theta(z), θ\theta^* minimizes risk R(θ)=EP[θ(Z)]R(\theta)=E_P[\ell_\theta(Z)].

The standard posterior,

πn(θZn)π0(θ)L(θ;Zn)\pi_n(\theta|Z^n) \propto \pi_0(\theta) L(\theta;Z^n)

with LL the likelihood, concentrates on θ\theta^* as nn\to\infty. However, uncertainty quantification is unreliable: credible sets' frequentist coverage can differ sharply from nominal levels. Under regularity, credible sets' actual coverage is determined by a "sandwich" covariance formula involving both the expected Hessian and score covariance, which diverge under misspecification (Frazier et al., 2023).

2. Concentration Properties and Remedies

2.1 SafeBayesian and η-Generalized Posteriors

To restore concentration and calibration under misspecification, the likelihood can be tempered: πnη(θZn)π0(θ)L(θ;Zn)η\pi_n^\eta(\theta|Z^n) \propto \pi_0(\theta) L(\theta;Z^n)^\eta for a learning rate η(0,1)\eta\in(0,1). For generalized linear models (GLMs), there exists a central condition ηˉ>0\bar{\eta}>0 such that for all θ\theta, EZP[exp(ηˉ(θ(Z)θ(Z)))]1E_{Z\sim P}[\exp(-\bar{\eta}(\ell_\theta(Z)-\ell_{\theta^*}(Z)))]\leq 1 (Heide et al., 2019). When this holds, for 0<η<ηˉ0<\eta<\bar{\eta}, the η\eta-generalized posterior concentrates rapidly around θ\theta^* in a misspecification-specific metric dηˉd_{\bar{\eta}}, with rates O(dnlogn)O\left(\frac{d}{n} \log n\right). Excess risk bounds, under exponential-tail assumptions, are O(dn(logn)2)O\left( \frac{d}{n} (\log n)^2 \right).

The "SafeBayes" algorithm selects η\eta via an online minimization of the cumulative posterior-randomized log-loss: S(η)=i=1nEθπi1η[logpθ(Zi)]S(\eta) = \sum_{i=1}^n E_{\theta\sim\pi_{i-1}^\eta}[-\log p_\theta(Z_i)] and recommends η^=argminηS(η)\hat{\eta}=\arg\min_\eta S(\eta). This robustly chooses η^<1\hat\eta<1 when misspecification is present, restoring calibration and predictive performance (Heide et al., 2019, Grünwald et al., 2014).

2.2 Score-Based Approximations: The Q-Posterior

Safe uncertainty quantification is achievable using the Q-posterior—a quadratic form in the score, leveraging its empirical covariance matrix Wn(θ)W_n(\theta): πnQ(θ)π(θ)Wn(θ)1/2exp{12[mn(θ)/n]Wn(θ)1[mn(θ)/n]}\pi_n^Q(\theta) \propto \pi(\theta) |W_n(\theta)|^{-1/2} \exp\left\{ - \frac{1}{2} [m_n(\theta)/\sqrt{n}]^\top W_n(\theta)^{-1} [m_n(\theta)/\sqrt{n}] \right\} where mn(θ)=θn(θ)m_n(\theta) = -\nabla_\theta \ell_n(\theta). This yields credible sets with correct frequentist coverage irrespective of misspecification and applies directly to latent-variable and generalized-loss posteriors (Frazier et al., 2023, Nott et al., 2023).

3. Modular, Restricted, and Projection Methods for Robust Inference

Misspecification often only affects particular parts (modules) of complex models. Modular inference ("cutting feedback") restricts posterior updating in the affected section—preventing contaminated feedback from propagating. Restricted likelihood methods, such as Bayesian restricted likelihood (BRL), base inference only on a data summary S=s(y)S=s(y) sufficiently robust under misspecification. Projected inference methods fit a nonparametric (or highly flexible) reference model FF and project posterior draws onto a simplified model SS via KL-projection: θS=argmaxθSEF[logp(yθS,S)]\theta_S^\perp = \arg\max_{\theta_S} E_{F}[\log p(y|\theta_S,S)] These methods redefine inference so that it targets interpretable parameters or predictive quantities robust to misspecification (Nott et al., 2023, Li, 2023, Smith et al., 2023).

Table: Modular and Restricted Bayesian Remedies

Method Summary Statistic / Module Posterior Target
BRL Robust summary s(y)s(y) πr(θs)\pi_r(\theta|s)
Cut (modular) Submodel ϕ\phi πcut(ϕX)\pi_{\text{cut}}(\phi|X), π(ζY,ϕ)\pi(\zeta|Y,\phi)
KL-projection Flexible FF, target SS {θS(F)}\{\theta_S^\perp(F)\}

4. Predictive Performance and Generalization under Misspecification

Classical PAC-Bayes and Bayesian model averaging provide suboptimal generalization bounds when models are misspecified. Second-order PAC-Bayes bounds incorporate a variance (diversity) correction term, leading to new algorithms (PAC²-Variational, PAC²-Ensemble) that optimize predictive cross-entropy directly: CE(ρ)=Exν[lnppred(x)]CE(\rho) = E_{x\sim\nu}[-\ln p_{\text{pred}}(x)] where ppred(x)=Eθρ[p(xθ)]p_{\text{pred}}(x)=E_{\theta\sim\rho}[p(x|\theta)]. These Bayesian-like but non-Bayesian posterior constructions consistently outperform standard Bayesian predictions in empirical and simulated settings, especially under heavy misspecification (Masegosa, 2019).

5. Misspecified Bayesianism: Observational Equivalence and Rationalizability

A sequence of beliefs is consistent with "misspecified Bayesianism" if the prior contains a "grain" (mixture) of the average posterior, formalized as a partition-based grain condition. With full-support priors over finite or compact spaces, any observed law of posteriors is MB-rationalizable, but misspecified Bayesianism imposes tail limitations on unbounded spaces—precluding heavy-tailed posteriors from light-tailed priors. The upshot is that many heuristic or apparently non-Bayesian updating schemes are observationally indistinguishable from Bayesian updating under implicit misspecification (Molavi, 30 Jul 2025).

6. Learning Dynamics, Equilibrium, and Economic Implications

Learning dynamics with misspecified models converge to generalized equilibria balancing optimal actions against best-fitting subjective beliefs (KL-minimizers), formalized via Berk–Nash equilibrium (Esponda et al., 2019, Li et al., 30 May 2024, Ghosh, 24 Jul 2024). In principal-agent contracts or social learning environments, misspecification can generate persistent biases, cycling, or sharp welfare losses, even when approximate rationality is maintained. Computational complexity results show that even weak misspecification can render equilibrium computation prohibitively hard for large action spaces (Li et al., 30 May 2024). In bandit and meta-learning contexts, prior misspecification degrades performance gracefully (at most O(H2ϵ)O(H^2\epsilon) in horizon and TV-distance), but learning the prior dynamically across tasks recovers oracle performance (Simchowitz et al., 2021, Peleg et al., 2021).

7. Practical Guidelines and Implementation

Misspecification detection and correction in practice involves monitoring diagnostic quantities (e.g., mixability gaps or squared error risk), implementing SafeBayes or Q-posterior samplers, and experimenting with tempered posteriors (η<1\eta<1). In deep learning contexts, replacing Gaussian assumptions with heavy-tailed likelihoods or meta-learned priors yields immediate empirical performance gains, often obviating the need for cold (over-sharpened) posteriors (Vaart et al., 29 Aug 2025). Modular, restricted, and projection approaches should be considered in complex models, especially to isolate trusted modules or robust aspects of the data (Nott et al., 2023).

References

Whiteboard

Follow Topic

Get notified by email when new papers are published related to Misspecified Bayesian Learning.