Papers
Topics
Authors
Recent
Search
2000 character limit reached

Recursive Marginal Likelihood Estimators

Updated 15 January 2026
  • Recursive marginal likelihood estimators are iterative algorithms that approximate Bayesian evidence by constructing intermediate distributions between the prior and posterior.
  • They utilize methods such as predictive recursion, stochastic gradient annealed importance sampling, and bridge sampling to handle high-dimensional and latent variable models.
  • Their efficient computational design and proven convergence guarantees make them vital for robust Bayesian model selection and evidence computation.

Recursive marginal likelihood estimators comprise a class of algorithms designed to estimate the marginal likelihood (evidence) in Bayesian models via iterative, sequential, or recursive procedures. These estimators are central to Bayesian model selection, serve in high-dimensional latent variable models, semiparametric mixture models, and enable practical evidence computation where direct integration is infeasible. The recursive approach leverages filtering, sequential Monte Carlo, or stochastic approximation techniques to exploit the structure of the model or data and to update estimates efficiently as data or parameters change.

1. Theoretical Foundations and Model Setup

Marginal likelihood estimation is required to compute Z=ΩL(yθ)π(θ)dθZ = \int_\Omega L(y|\theta)\,\pi(\theta)d\theta, the normalizing constant of the product of the likelihood and prior in Bayesian inference. In scenarios where the likelihood is intractable or the model is high-dimensional, naïve numerical quadrature is prohibitive. Recursive marginal likelihood estimators address this by constructing a sequence of intermediate measures or densities that bridge prior and posterior measures.

Common model structures targeted by recursive estimators include:

  • Mixture models: mf,θ(y)=Up(yθ,u)f(u)dμ(u)m_{f,\theta}(y) = \int_{\mathcal{U}} p(y|\theta,u) f(u)d\mu(u) with structural parameter θ\theta and nonparametric mixing density ff (Martin et al., 2011).
  • Latent variable models: p(y,xθ)p(y,x|\theta) with integration over latent variables xRdx\in\mathbb{R}^d (Bortoli et al., 2019).
  • Sequential data models: i.i.d. or conditionally independent y1:Ny_{1:N}, amenable to product rule factorization of the evidence (Cameron et al., 2019).

2. Major Recursive Marginal Likelihood Estimation Methods

2.1 Predictive Recursion Marginal Likelihood (PRML)

PRML refines the estimation of the structural parameter in semiparametric mixture models. For fixed θ\theta, the predictive recursion (PR) updates the unknown mixing density ff sequentially:

For i=1,,ni=1,\ldots,n:

  1. Compute the predictive mixture at YiY_i: mi1,θ(Yi)=p(Yiθ,u)fi1(u)dμ(u)m_{i-1,\theta}(Y_i) = \int p(Y_i|\theta,u) f_{i-1}(u)d\mu(u).
  2. Update: fi(u)=(1wi)fi1(u)+wip(Yiθ,u)fi1(u)mi1,θ(Yi)f_i(u) = (1-w_i)f_{i-1}(u) + w_i \frac{p(Y_i|\theta,u)f_{i-1}(u)}{m_{i-1,\theta}(Y_i)}.

The PR marginal likelihood is then built as Ln(θ)=i=1nmi1,θ(Yi)L_n(\theta) = \prod_{i=1}^n m_{i-1,\theta}(Y_i) (Martin et al., 2011).

2.2 Sequential Recursive Importance and Annealing

Marginal likelihood is factorized as p(y1:n)=i=1np(yiy<i)p(y_{1:n})=\prod_{i=1}^n p(y_i|y_{<i}), with each predictive p(yiy<i)=p(yiθ)p(θy<i)dθp(y_i|y_{<i}) = \int p(y_i|\theta)p(\theta|y_{<i})d\theta approximated recursively. Stochastic Gradient Annealed Importance Sampling (SGAIS) interleaves annealed importance sampling (AIS) with stochastic gradient MCMC for online, chunk-wise estimation, using unbiased mini-batch approximations and adaptive temperature scheduling (Cameron et al., 2019).

2.3 Recursive Bridge Sampling and Biased Sampling

Recursive estimation of normalization constants ZjZ_j for intermediate "bridging" densities Fj(θ)wj(θ)π(θ)F_j(\theta)\propto w_j(\theta)\pi(\theta), spanning from prior to posterior, is governed by the core identity:

Z^kj=1mi=1njwk(θi(j))s=1mnsws(θi(j))/Z^s,\hat{Z}_k \leftarrow \sum_{j=1}^m \sum_{i=1}^{n_j} \frac{w_k(\theta_i^{(j)})}{\sum_{s=1}^m n_s\, w_s(\theta_i^{(j)})/\hat{Z}_s},

where θi(j)Fj\theta_i^{(j)}\sim F_j (Cameron et al., 2013). This covers the "biased sampling," "reverse logistic regression," and "density of states" methodologies.

2.4 Stochastic Approximation via Unadjusted Langevin Monte Carlo (ULMC)

Intractable expectations in the evidence gradient are replaced by Monte Carlo averages using ULMC chains targeting the posterior of latent variables xx given y,θy,\theta, iteratively updating θ\theta by Robbins–Monro stochastic approximation:

θn+1=ΠΘ[θn+γn+1(a^n(θn)g(θn))],\theta_{n+1} = \Pi_\Theta\left[\theta_n + \gamma_{n+1}(\hat{a}_n(\theta_n) - \nabla g(\theta_n))\right],

where a^n(θn)\hat{a}_n(\theta_n) is the Monte Carlo estimate of the gradient (Bortoli et al., 2019).

3. Convergence and Theoretical Guarantees

Table: Convergence Properties of Key Recursive Marginal Likelihood Estimators

Method Main Convergence Guarantee Reference
Predictive recursion (PRML) Pointwise: (1/n)logLn(θ)K(θ)(1/n)\log L_n(\theta) \to -K^*(\theta) a.s.; θ^nθ\hat\theta_n\to\theta^* (Martin et al., 2011)
Biased/reverse LR recursion Asymptotic normality, all ZjZ_j converge jointly (Cameron et al., 2013)
SGAIS Unbiasedness, variance decays O(M1)O(M^{-1}) with MM particles (Cameron et al., 2019)
Stochastic Approx. (ULMC) a.s. convergence of (θn)(\theta_n) to maximizer θ\theta^*, explicit rate (Bortoli et al., 2019)

For predictive recursion, under conditions on the kernel pp, compactness of U\mathcal{U}, decay of weights wiw_i, and a second-moment condition, one obtains almost sure pointwise convergence and, under uniqueness, Wald consistency and explicit rates. The stochastic gradient estimators provide non-asymptotic convergence bounds, with rates matching standard stochastic gradient theory given suitable step-size sequences and kernels.

4. Algorithmic Realizations and Practical Considerations

Recursive estimators achieve favorable computational complexity:

  • PRML: O(n)O(n) per candidate θ\theta in mixture models, with weight decay parameter (commonly wi=(i+1)γw_i=(i+1)^{-\gamma}, γ2/3\gamma\approx2/3).
  • Biased sampling/RLR: cost O(mjnj)O(m\sum_j n_j) for mm bridging densities and njn_j samples per density; draws must be i.i.d.
  • SGAIS: cost per data chunk scales as O(KnMB)O(K_nM|B|) for KnK_n annealing steps, MM particles, and mini-batch size B|B|, but does not grow with nn (Cameron et al., 2019).
  • Stochastic approximation with ULMC: O(mnCgrad)O(m_n C_\text{grad}) per iteration, where mnm_n is the number of inner-loop ULMC steps.

Key steps in algorithmic implementation include:

  • Choice of weight or step-size sequences to control convergence and stability.
  • Ensuring sufficient overlap of bridging densities (e.g., power posteriors or partial-data posteriors) (Cameron et al., 2013).
  • Adaptive annealing schedules to maintain effective sample size in SGAIS.
  • Warm-starting in high-dimensional latent variable models for computational efficiency.

5. Comparative Performance and Methodological Distinctions

Recursive estimators are compared against:

  • Profile likelihood, which plugs in the mixing density estimate at the final observation and fails to account for uncertainty in ff (Martin et al., 2011).
  • Dirichlet process mixture marginal likelihoods, which are Bayesian but computationally demanding due to MCMC or importance sampling requirements.
  • Nested sampling, which operates via live-point constrained prior sequences and does not admit a straightforward central limit theorem for its normalization constant (Cameron et al., 2013).

Recursive estimator benefits include unbiasedness (SGAIS), low computational cost per data update (PRML and SGAIS), and flexibility for online, streaming, and empirical Bayes applications. Simulation studies indicate marginal likelihood estimates from PRML track closely with fully Bayesian methods in density estimation and are more stable than profile likelihood (Martin et al., 2011). SGAIS achieves accuracy within 0.1–0.6% of nested sampling/AIS and orders-of-magnitude speedups (Cameron et al., 2019).

6. Extensions, Applications, and Sensitivity Analysis

Recursive marginal likelihood estimators are extensible to:

  • Mixed-effects and regression models via nonparametric mixing over random effects or coefficients (Martin et al., 2011).
  • Empirical Bayes multiple testing in time series, exploiting mixtures over AR parameters and learning the degree of sparsity (Martin et al., 2011).
  • Prior-sensitivity analysis by constructing pseudo-mixture distributions over pooled draws, facilitating importance reweighting and effective sample size monitoring (Cameron et al., 2013).
  • Fully Bayesian hybrids, augmenting PRML with curvature-based sampling to propagate uncertainty (Martin et al., 2011).
  • Dynamic data assimilation and streaming Bayesian model selection, enabled by SGAIS (Cameron et al., 2019).

Applied examples include finite and infinite normal mixture models for astronomical data, sparse logistic regression with random effects, and high-dimensional statistical audio analysis (Cameron et al., 2013, Bortoli et al., 2019). Sensitivity analyses reveal marked shifts in model selection posteriors under alternative priors, corroborating the need for routine robustness assessment via recursive estimators.

7. Recommendations and Practitioner Guidance

For effective deployment of recursive marginal likelihood estimators:

  • Select bridging sequences to minimize divergence and ensure adequate overlap (e.g., power/posterior partial-data paths with tailored spacing parameters).
  • Include the prior as a bridging density to stabilize recursion.
  • Refine bridging steps where thermodynamic integration via importance sampling indicates rapid integrand variation (Cameron et al., 2013).
  • Monitor effective sample size and variance of the estimator; leverage bootstrap or asymptotic covariance formulas for error bars.
  • Combine with nested sampling draws when likelihood evaluation is costly, exploiting the ability to reuse samples and reduce estimator variance.

Reporting both model selection metrics and prior-sensitivity results is considered best practice to assess the robustness of inferential conclusions.


References:

  • "Semiparametric inference in mixture models with predictive recursion marginal likelihood" (Martin et al., 2011)
  • "Stochastic Gradient Annealed Importance Sampling for Efficient Online Marginal Likelihood Estimation" (Cameron et al., 2019)
  • "Recursive Pathways to Marginal Likelihood Estimation with Prior-Sensitivity Analysis" (Cameron et al., 2013)
  • "Efficient stochastic optimisation by unadjusted Langevin Monte Carlo. Application to maximum marginal likelihood and empirical Bayesian estimation" (Bortoli et al., 2019)

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Recursive Marginal Likelihood Estimators.