Papers
Topics
Authors
Recent
Search
2000 character limit reached

M+ Framework: Robust Bayesian Inference

Updated 28 January 2026
  • M⁺ Framework is a robust Bayesian inference method that constructs posteriors using generic loss functions, subsuming standard Bayesian updating with enhanced resilience to outliers.
  • It achieves asymptotic normality and credible set coverage analogous to the Bernstein–von Mises theorem, ensuring accurate frequentist properties under robust loss choices.
  • Empirical examples, such as Dirichlet-process clustering and Poisson factorization, demonstrate its capacity to recover true clusters and maintain stability under data corruption.

The M⁺ (M-posterior) framework generalizes Bayesian posteriors by connecting Bayesian updating directly to the theory of M-estimators in the frequentist paradigm. It constructs posteriors using a generic loss function that defines an underlying M-estimator, subsuming the standard Bayesian posterior (recovered when the loss is the negative log likelihood) and extending Bayesian methods to a wide class of robust loss functions. The framework establishes foundational results on asymptotic normality, posterior contraction, frequentist coverage, and robustness, introducing the posterior influence function and a posterior breakdown point for rigorous characterization.

1. Definition and Formal Construction

Given i.i.d. data Xn=(X1,...,Xn)X^n = (X_1, ..., X_n) from a model on %%%%1%%%% with a prior π(θ)\pi(\theta), the M-posterior is built from a contrast (loss) ρ(x,θ)\rho(x, \theta), which defines the M-estimator: θ^n=argminθ1ni=1nρ(Xi,θ).\hat \theta_n = \arg\min_\theta \frac{1}{n} \sum_{i=1}^n \rho(X_i, \theta). The corresponding M-posterior density is

πn(θXn)exp(nLn(θ))π(θ),\pi_n(\theta \mid X^n) \propto \exp(-n L_n(\theta)) \pi(\theta),

where Ln(θ)=1ni=1nρ(Xi,θ)L_n(\theta) = \frac{1}{n} \sum_{i=1}^n \rho(X_i, \theta). This equivalently rewrites as

πn(θXn)=exp(iρ(Xi,θ))π(θ)Zn.\pi_n(\theta \mid X^n) = \frac{\exp(-\sum_i \rho(X_i, \theta)) \pi(\theta)}{Z_n}.

Setting ρ=logf\rho = -\log f recovers the standard Bayesian posterior. The framework thus unifies Bayesian updating and robust estimation via user-specified losses (Marusic et al., 1 Oct 2025).

2. Asymptotic Normality and the Bernstein–von Mises Theorem

The key asymptotic result states that, under regularity assumptions analogous to local asymptotic normality (WMLAN) and mild priors:

  • The log-likelihood ratio based on ρ\rho admits a second-order Taylor expansion about θ\theta^*, where Vθ=2E[ρ(X,θ)]/θ2θV_{\theta^*} = -\partial^2 E[\rho(X, \theta)]/\partial \theta^2|_{\theta^*}, and Δn=n(θ^nθ)\Delta_n = \sqrt{n}(\hat\theta_n - \theta^*).
  • The M-posterior contracts at root-nn rate and is asymptotically Gaussian: dTV(πn(Xn),N(θ^n,Vθ1/n))0d_{\mathrm{TV}}\bigl(\pi_n(\cdot \mid X^n), \mathcal{N}(\hat\theta_n, V_{\theta^*}^{-1}/n)\bigr) \to 0 in probability, provided the prior is positive and continuous near θ\theta^*. Thus, around θ^n\hat\theta_n, πn\pi_n behaves as a normal distribution with variance Vθ1/nV_{\theta^*}^{-1}/n, paralleling the Bernstein–von Mises theorem for classical Bayesian posteriors (Marusic et al., 1 Oct 2025).

3. Posterior Contraction and Frequentist Coverage

Consequences of asymptotic normality include:

  • n\sqrt{n}-consistency: For any fixed MM,

πn{θθ>M/n}Xn0\pi_n\{\|\theta - \theta^*\| > M/\sqrt{n}\}|X^n \to 0

in probability.

  • Bayesian credible balls Cn={θ:θθ^nz1α/2/ndet(Vθ)}C_n = \{\theta : \|\theta - \hat\theta_n\| \leq z_{1-\alpha/2}/\sqrt{n} \cdot \sqrt{\det(V_{\theta^*})}\} satisfy

P[θCn]1α,P[\theta^* \in C_n] \to 1 - \alpha,

i.e., credible sets yield correct frequentist coverage under regularity, even for general choices of ρ\rho (Marusic et al., 1 Oct 2025).

4. Robustness: Posterior Influence Function and Breakdown Point

Posterior Influence Function (PIF)

For contamination Fn,ϵ,x0=(1ϵ)Fn+ϵδx0F_{n,\epsilon,x_0} = (1-\epsilon) F_n + \epsilon \delta_{x_0}, the PIF for the posterior density is

PIF(x0;θ,Fn)=ddϵπn(θFn,ϵ,x0)ϵ=0\mathrm{PIF}(x_0;\theta,F_n) = \left. \frac{d}{d\epsilon} \pi_n(\theta \mid F_{n, \epsilon, x_0}) \right|_{\epsilon=0}

and can be computed as

PIF(x0;θ)=nπn(θ)[g(x0,θ)πn(θ)g(x0,θ)dθ]\mathrm{PIF}(x_0; \theta) = n \pi_n(\theta) \left[ g(x_0, \theta) - \int \pi_n(\theta') g(x_0, \theta') d\theta' \right]

where g(x,θ)=EFn[ρ(X,θ)]ρ(x,θ)g(x, \theta) = E_{F_n}[\rho(X, \theta)] - \rho(x, \theta). Boundedness of the score ψ(x,θ)=ρ(x,θ)/θ\psi(x,\theta) = \partial \rho(x, \theta)/\partial \theta and mild priors ensure that supx0,θPIF(x0;θ)<\sup_{x_0, \theta}|\mathrm{PIF}(x_0;\theta)| < \infty. For unbounded ψ\psi, such as quadratic loss, the PIF diverges (Marusic et al., 1 Oct 2025).

Posterior Breakdown Point

Let W2W_2 denote the 2-Wasserstein distance. The posterior breakdown point is

ϵW2(πn,Xn)=min{m/n:supF diff. in mW2(πn(F),πn(Fn))=}.\epsilon^*_{W_2}(\pi_n, X^n) = \min\left\{ m/n : \sup_{F' \text{ diff. in } m} W_2(\pi_n(\cdot|F'), \pi_n(\cdot|F_n)) = \infty \right\}.

Results for location M-posteriors (ρ(xθ)\rho(x - \theta) convex, bounded score):

  • Flat prior: ϵW2=1/2\epsilon^*_{W_2} = 1/2 (and matches the M-estimator's breakdown).
  • Exponential-like prior tails: ϵ1/2\epsilon^* \geq 1/2, approaching $1/2$ as nn \to \infty.
  • Super-exponential prior tails: no breakdown (posterior cannot be driven off to infinity). Redescending losses (bounded and ψ0\psi \to 0) yield ϵ1/2\epsilon^* \geq 1/2 and exactly $1/2$ for a flat prior (Marusic et al., 1 Oct 2025).

Posterior means and quantiles inherit the breakdown point, i.e., ϵ(mean)ϵW2(πn)\epsilon^*(\mathrm{mean}) \geq \epsilon^*_{W_2}(\pi_n), and similarly for quantiles.

5. Illustrative Applications and Numerical Comparisons

Dirichlet-Process Mixture for Clustering

Using N=2000N = 2000 points from three skew-normal components, a Dirichlet-process Gaussian mixture clustering experiment demonstrates:

  • Standard Gaussian likelihood posteriors yield 5 spurious clusters due to sensitivity to outliers.
  • The M-posterior with a Huber-type loss on standardized residuals accurately recovers the 3 true clusters, highlighting robustness (Marusic et al., 1 Oct 2025).

Robust Poisson Factorization for Recommendation

In a Poisson factorization of the MovieLens 1M dataset under $0$–10%10\% random corruption, three approaches are compared:

  • (a) Standard posterior (no reweighting).
  • (b) Reweighted posterior (latent αuΓ(a,b)\alpha_u \sim \Gamma(a,b)).
  • (c) M-posterior with induced loss ρ(Xu,vu)=logΓ(a,b)f(yθ)αdα\rho(X_u, v_u) = -\log \int \Gamma(a, b) f(y \mid \theta)^\alpha d\alpha.

Empirical results (negative out-of-sample log-likelihood):

Corruption Standard Reweighted M-posterior
0% 1.724 1.690 1.689
5% 1.739 1.725 1.727
10% 1.758 1.746 1.748

The M-posterior matches the robustness of the reweighted approach without introducing additional latent weight variables (Marusic et al., 1 Oct 2025).

6. Broader Implications and Applicability

The M⁺ framework encompasses a large class of generalized posteriors defined by convex or redescending losses, under mild regularity requirements, and provides a unified theoretical foundation for robust Bayesian inference. It ensures that under suitable losses and priors, the Bayesian credible sets coincide with frequentist confidence sets, and the posteriors demonstrate desirable robustness properties including high breakdown points and bounded influence. The generality of the construction allows deployment in standard Bayesian models (such as mixtures and matrix factorization), yielding practical robustness improvements in the presence of data contamination or heavy tails (Marusic et al., 1 Oct 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to M+ Framework.