M-Posteriors: Robust Bayesian Inference

Updated 4 October 2025

M-Posteriors are generalized Bayesian posteriors that replace the log-likelihood with a loss function, enabling robust inference akin to M-estimators.
They exhibit strong frequentist properties including consistency, asymptotic normality, and a Bernstein–von Mises theorem adaptation for reliable uncertainty quantification.
Applications in robust location models, quantile regression, and mixture modeling highlight their effectiveness in handling outliers and model misspecifications.

An M-posterior is a generalized Bayesian posterior distribution defined via a loss function, in direct analogy to the way M-estimators generalize maximum likelihood estimators in frequentist statistics. Rather than updating beliefs with a log-likelihood, the M-posterior framework exponentiates a loss (or risk) function evaluated over data, applies a prior, and produces a posterior that may be centered around traditional M-estimators such as those based on robust or quantile losses. This construction is intentionally broad, encompassing many generalized Bayesian posteriors—such as α-posteriors and power posteriors—and is designed to inherit both robustness and desirable frequentist properties from the underlying M-estimator. Recent work has provided a rigorous theoretical foundation for the use of M-posteriors, establishing their consistency, asymptotic normality, and precise robustness guarantees, and demonstrating their application across a range of models including robust location, quantile regression, and robust mixture modeling (Marusic et al., 1 Oct 2025).

1. The Generalized Bayesian M-Posterior Construction

The core construction of an M-posterior replaces the log-likelihood in Bayes’ theorem with a general loss (or risk) function ρ(x, θ), yielding a “Gibbs posterior” of the form

$\pi(\theta \mid F_n) \propto \exp\Bigl\{ -n\, \mathbb{E}_{F_n}\bigl[\rho(X,\theta)\bigr]\Bigr\}\,\pi(\theta),$

where Fₙ is the empirical distribution for the sample and π(θ) is a prior. The empirical average risk replaces the (negative) average log-likelihood, giving the update mechanic of a generalized Bayesian posterior.

When ρ(x,θ) = –log f(x|θ) (i.e., the negative log-likelihood), this returns the standard Bayesian posterior. For arbitrary ρ, the M-posterior focuses on parameter values θ that deliver low average (empirical) loss, weighted by the prior. Central to the framework is the direct connection to the corresponding M-estimator: $\hat{\theta}_n = \arg\min_{\theta} \frac{1}{n}\sum_{i=1}^n \rho(X_i,\theta),$ which, in the robust statistics literature, is the target of the M-posterior’s concentration.

This construction is flexible: by choice of ρ, the M-posterior may encode robustness to outliers, tailor inference to quantiles, or adjust for heavy-tailed data.

2. Frequentist Asymptotics and Bernstein–von Mises Theorem

Under standard local asymptotic normality conditions for the loss function (termed Weighted M-LAN in the generalized setting), and assuming concentration properties for the posterior, the M-posterior distribution admits a Bernstein–von Mises (BvM) theorem: $d_{TV}\bigl( \pi(\theta \mid F_n), \mathcal{N}( \hat{\theta}_n, V_{\theta^*}^{-1}/n \bar{\alpha}) \bigr) \rightarrow 0,$ where $\hat{\theta}_n$ is the (weighted) M-estimator, $V_{\theta^*}$ is a curvature (second-derivative) matrix of the expected loss at the true parameter θ*, and $\bar{\alpha}$ is an average weight if data reweighting is employed (Marusic et al., 1 Oct 2025).

As a result, the M-posterior is consistent and contracts at the standard $\sqrt{n}$ parametric rate around the M-estimator, and its asymptotic variance matches that of the reference M-estimator. This generalizes the classical Bayesian BvM result by freeing the loss function from strict likelihood specification, allowing for more general (including robust) inferences.

3. Robustness: Posterior Influence Function and Breakdown Point

M-posteriors are designed to be robust, with their robustness properties formalized via the posterior influence function (PIF) and a novel posterior breakdown point concept.

Posterior Influence Function (PIF)

For contaminated empirical distributions $F_{n,\epsilon,x_0} = (1-\epsilon)F_n + \epsilon\delta_{x_0}$ , the PIF is defined as the Gateaux derivative: $\mathrm{PIF}(x_0; \theta, \rho, F_n) = \left.\frac{d}{d\epsilon}\right|_{\epsilon=0} \pi(\theta \mid F_{n,\epsilon,x_0}).$ When the derivative of ρ (the score function) is bounded and the prior is suitably chosen, the PIF is uniformly bounded, indicating local insensitivity to typical outliers—mirroring frequentist results for robust M-estimators.

Breakdown Point

A finite-sample posterior breakdown point is defined via the 2-Wasserstein distance: $\varepsilon^*_{W_2}(\pi^\rho_n, X^n) := \min\left\{ \frac{m}{n} : \sup_{F_{(n,m)}\in\mathcal{F}_{(n,m)}} W_2\Bigl((\cdot\mid F_{(n,m)}),\, (\cdot\mid F_n)\Bigr) = \infty \right\}.$ With certain robust losses (e.g. Huber) and flat priors, the breakdown point reaches 1/2, exactly as in the corresponding frequentist estimator. For light-tailed priors, no breakdown occurs—the prior dominates.

4. Empirical Examples: Robustness and Applicability

Empirical demonstrations in the framework highlight the efficacy of M-posteriors in several models:

Huber Location Posterior: Using the Huber loss, the M-posterior is centered on the robust Huber M-estimator, achieving resistance to outliers. Compared to the standard likelihood-based posterior, the M-posterior’s posterior mean and credible sets remain stable in the presence of extreme data, as confirmed by numerical studies.
Bayesian Quantile Regression: Employing the check loss produces an M-posterior targeting conditional quantiles, with the posterior centered around the quantile regression estimator. The robustness and asymptotic normality are inherited from the underlying quantile loss properties.
Poisson Factorization and Clustering: For matrix factorization and clustering problems, robust M-posteriors (constructed via robust losses or data reweighting) avoid common pitfalls, such as the emergence of spurious clusters or over-sensitivity to anomalous users, which can otherwise afflict mixture models fit via conventional likelihood posteriors.

5. Robust Posterior Inference: Methodological Implications

The generalization from likelihood to arbitrary loss-based updating allows M-posteriors to address challenges in statistical modeling far beyond the reach of standard Bayesian methods:

Reweighting for Robustness: The framework accommodates latent-data reweighting (e.g., using a Gamma prior for data weights), leading to “robustified” M-posteriors with bounded influence, particularly valuable when data contamination is suspected.
Choice of Prior and Tail Behavior: The tradeoff between robustness and prior informativeness is explicit: uninformative (flat) priors allow the loss function’s robustness properties to be fully translated into the posterior, whereas strong (light-tailed) priors can suppress breakdown effects.
Unified Treatment of α- and Power Posteriors: By suitable selection of ρ, the M-posterior framework subsumes power posteriors (where the loss is a scaled negative log-likelihood) and other generalizations, enabling fine-grained control over contraction rates and robustness.

6. Applications and Broader Relevance

Applications of M-posteriors span robust estimation in location models, quantile regression, robust mixture and factor models, and high-dimensional or hierarchical settings. In each, the use of a suitable loss function within the M-posterior delivers both frequentist and Bayesian guarantees, addressing resistance to outliers or model misspecification.

Robust location models: M-posterior delivers a 1/2 breakdown point in the classical model when coupled with a robust loss.
Quantile regression: Direct targeting of conditional quantiles with transparent frequentist interpretation.
Mixture models: Robustness to aberrant data prevents ghost clusters and enhances stability in unsupervised learning tasks.

In all cases, the M-posterior offers a path to robust, computationally tractable Bayesian inference that is centered around estimators with sound frequentist properties, and whose uncertainty quantification (via credible sets) remains meaningful even under moderate data contamination (Marusic et al., 1 Oct 2025).

The M-posterior framework thus serves as a unifying methodology in Bayesian inference, providing both theoretical and practical pathways to robust, consistent, and interpretable statistical analysis across a wide diversity of models and application areas.

PDF Markdown Chat (Pro)

References (1)

A theoretical framework for M-posteriors: frequentist guarantees and robustness properties (2025)

Follow Topic

Get notified by email when new papers are published related to M-Posteriors.