M+ Framework: Robust Bayesian Inference
- M⁺ Framework is a robust Bayesian inference method that constructs posteriors using generic loss functions, subsuming standard Bayesian updating with enhanced resilience to outliers.
- It achieves asymptotic normality and credible set coverage analogous to the Bernstein–von Mises theorem, ensuring accurate frequentist properties under robust loss choices.
- Empirical examples, such as Dirichlet-process clustering and Poisson factorization, demonstrate its capacity to recover true clusters and maintain stability under data corruption.
The M⁺ (M-posterior) framework generalizes Bayesian posteriors by connecting Bayesian updating directly to the theory of M-estimators in the frequentist paradigm. It constructs posteriors using a generic loss function that defines an underlying M-estimator, subsuming the standard Bayesian posterior (recovered when the loss is the negative log likelihood) and extending Bayesian methods to a wide class of robust loss functions. The framework establishes foundational results on asymptotic normality, posterior contraction, frequentist coverage, and robustness, introducing the posterior influence function and a posterior breakdown point for rigorous characterization.
1. Definition and Formal Construction
Given i.i.d. data from a model on %%%%1%%%% with a prior , the M-posterior is built from a contrast (loss) , which defines the M-estimator: The corresponding M-posterior density is
where . This equivalently rewrites as
Setting recovers the standard Bayesian posterior. The framework thus unifies Bayesian updating and robust estimation via user-specified losses (Marusic et al., 1 Oct 2025).
2. Asymptotic Normality and the Bernstein–von Mises Theorem
The key asymptotic result states that, under regularity assumptions analogous to local asymptotic normality (WMLAN) and mild priors:
- The log-likelihood ratio based on admits a second-order Taylor expansion about , where , and .
- The M-posterior contracts at root- rate and is asymptotically Gaussian: in probability, provided the prior is positive and continuous near . Thus, around , behaves as a normal distribution with variance , paralleling the Bernstein–von Mises theorem for classical Bayesian posteriors (Marusic et al., 1 Oct 2025).
3. Posterior Contraction and Frequentist Coverage
Consequences of asymptotic normality include:
- -consistency: For any fixed ,
in probability.
- Bayesian credible balls satisfy
i.e., credible sets yield correct frequentist coverage under regularity, even for general choices of (Marusic et al., 1 Oct 2025).
4. Robustness: Posterior Influence Function and Breakdown Point
Posterior Influence Function (PIF)
For contamination , the PIF for the posterior density is
and can be computed as
where . Boundedness of the score and mild priors ensure that . For unbounded , such as quadratic loss, the PIF diverges (Marusic et al., 1 Oct 2025).
Posterior Breakdown Point
Let denote the 2-Wasserstein distance. The posterior breakdown point is
Results for location M-posteriors ( convex, bounded score):
- Flat prior: (and matches the M-estimator's breakdown).
- Exponential-like prior tails: , approaching $1/2$ as .
- Super-exponential prior tails: no breakdown (posterior cannot be driven off to infinity). Redescending losses (bounded and ) yield and exactly $1/2$ for a flat prior (Marusic et al., 1 Oct 2025).
Posterior means and quantiles inherit the breakdown point, i.e., , and similarly for quantiles.
5. Illustrative Applications and Numerical Comparisons
Dirichlet-Process Mixture for Clustering
Using points from three skew-normal components, a Dirichlet-process Gaussian mixture clustering experiment demonstrates:
- Standard Gaussian likelihood posteriors yield 5 spurious clusters due to sensitivity to outliers.
- The M-posterior with a Huber-type loss on standardized residuals accurately recovers the 3 true clusters, highlighting robustness (Marusic et al., 1 Oct 2025).
Robust Poisson Factorization for Recommendation
In a Poisson factorization of the MovieLens 1M dataset under $0$– random corruption, three approaches are compared:
- (a) Standard posterior (no reweighting).
- (b) Reweighted posterior (latent ).
- (c) M-posterior with induced loss .
Empirical results (negative out-of-sample log-likelihood):
| Corruption | Standard | Reweighted | M-posterior |
|---|---|---|---|
| 0% | 1.724 | 1.690 | 1.689 |
| 5% | 1.739 | 1.725 | 1.727 |
| 10% | 1.758 | 1.746 | 1.748 |
The M-posterior matches the robustness of the reweighted approach without introducing additional latent weight variables (Marusic et al., 1 Oct 2025).
6. Broader Implications and Applicability
The M⁺ framework encompasses a large class of generalized posteriors defined by convex or redescending losses, under mild regularity requirements, and provides a unified theoretical foundation for robust Bayesian inference. It ensures that under suitable losses and priors, the Bayesian credible sets coincide with frequentist confidence sets, and the posteriors demonstrate desirable robustness properties including high breakdown points and bounded influence. The generality of the construction allows deployment in standard Bayesian models (such as mixtures and matrix factorization), yielding practical robustness improvements in the presence of data contamination or heavy tails (Marusic et al., 1 Oct 2025).