Recursive Marginal Likelihood Estimators
- Recursive marginal likelihood estimators are iterative algorithms that approximate Bayesian evidence by constructing intermediate distributions between the prior and posterior.
- They utilize methods such as predictive recursion, stochastic gradient annealed importance sampling, and bridge sampling to handle high-dimensional and latent variable models.
- Their efficient computational design and proven convergence guarantees make them vital for robust Bayesian model selection and evidence computation.
Recursive marginal likelihood estimators comprise a class of algorithms designed to estimate the marginal likelihood (evidence) in Bayesian models via iterative, sequential, or recursive procedures. These estimators are central to Bayesian model selection, serve in high-dimensional latent variable models, semiparametric mixture models, and enable practical evidence computation where direct integration is infeasible. The recursive approach leverages filtering, sequential Monte Carlo, or stochastic approximation techniques to exploit the structure of the model or data and to update estimates efficiently as data or parameters change.
1. Theoretical Foundations and Model Setup
Marginal likelihood estimation is required to compute , the normalizing constant of the product of the likelihood and prior in Bayesian inference. In scenarios where the likelihood is intractable or the model is high-dimensional, naïve numerical quadrature is prohibitive. Recursive marginal likelihood estimators address this by constructing a sequence of intermediate measures or densities that bridge prior and posterior measures.
Common model structures targeted by recursive estimators include:
- Mixture models: with structural parameter and nonparametric mixing density (Martin et al., 2011).
- Latent variable models: with integration over latent variables (Bortoli et al., 2019).
- Sequential data models: i.i.d. or conditionally independent , amenable to product rule factorization of the evidence (Cameron et al., 2019).
2. Major Recursive Marginal Likelihood Estimation Methods
2.1 Predictive Recursion Marginal Likelihood (PRML)
PRML refines the estimation of the structural parameter in semiparametric mixture models. For fixed , the predictive recursion (PR) updates the unknown mixing density sequentially:
For :
- Compute the predictive mixture at : .
- Update: .
The PR marginal likelihood is then built as (Martin et al., 2011).
2.2 Sequential Recursive Importance and Annealing
Marginal likelihood is factorized as , with each predictive approximated recursively. Stochastic Gradient Annealed Importance Sampling (SGAIS) interleaves annealed importance sampling (AIS) with stochastic gradient MCMC for online, chunk-wise estimation, using unbiased mini-batch approximations and adaptive temperature scheduling (Cameron et al., 2019).
2.3 Recursive Bridge Sampling and Biased Sampling
Recursive estimation of normalization constants for intermediate "bridging" densities , spanning from prior to posterior, is governed by the core identity:
where (Cameron et al., 2013). This covers the "biased sampling," "reverse logistic regression," and "density of states" methodologies.
2.4 Stochastic Approximation via Unadjusted Langevin Monte Carlo (ULMC)
Intractable expectations in the evidence gradient are replaced by Monte Carlo averages using ULMC chains targeting the posterior of latent variables given , iteratively updating by Robbins–Monro stochastic approximation:
where is the Monte Carlo estimate of the gradient (Bortoli et al., 2019).
3. Convergence and Theoretical Guarantees
Table: Convergence Properties of Key Recursive Marginal Likelihood Estimators
| Method | Main Convergence Guarantee | Reference |
|---|---|---|
| Predictive recursion (PRML) | Pointwise: a.s.; | (Martin et al., 2011) |
| Biased/reverse LR recursion | Asymptotic normality, all converge jointly | (Cameron et al., 2013) |
| SGAIS | Unbiasedness, variance decays with particles | (Cameron et al., 2019) |
| Stochastic Approx. (ULMC) | a.s. convergence of to maximizer , explicit rate | (Bortoli et al., 2019) |
For predictive recursion, under conditions on the kernel , compactness of , decay of weights , and a second-moment condition, one obtains almost sure pointwise convergence and, under uniqueness, Wald consistency and explicit rates. The stochastic gradient estimators provide non-asymptotic convergence bounds, with rates matching standard stochastic gradient theory given suitable step-size sequences and kernels.
4. Algorithmic Realizations and Practical Considerations
Recursive estimators achieve favorable computational complexity:
- PRML: per candidate in mixture models, with weight decay parameter (commonly , ).
- Biased sampling/RLR: cost for bridging densities and samples per density; draws must be i.i.d.
- SGAIS: cost per data chunk scales as for annealing steps, particles, and mini-batch size , but does not grow with (Cameron et al., 2019).
- Stochastic approximation with ULMC: per iteration, where is the number of inner-loop ULMC steps.
Key steps in algorithmic implementation include:
- Choice of weight or step-size sequences to control convergence and stability.
- Ensuring sufficient overlap of bridging densities (e.g., power posteriors or partial-data posteriors) (Cameron et al., 2013).
- Adaptive annealing schedules to maintain effective sample size in SGAIS.
- Warm-starting in high-dimensional latent variable models for computational efficiency.
5. Comparative Performance and Methodological Distinctions
Recursive estimators are compared against:
- Profile likelihood, which plugs in the mixing density estimate at the final observation and fails to account for uncertainty in (Martin et al., 2011).
- Dirichlet process mixture marginal likelihoods, which are Bayesian but computationally demanding due to MCMC or importance sampling requirements.
- Nested sampling, which operates via live-point constrained prior sequences and does not admit a straightforward central limit theorem for its normalization constant (Cameron et al., 2013).
Recursive estimator benefits include unbiasedness (SGAIS), low computational cost per data update (PRML and SGAIS), and flexibility for online, streaming, and empirical Bayes applications. Simulation studies indicate marginal likelihood estimates from PRML track closely with fully Bayesian methods in density estimation and are more stable than profile likelihood (Martin et al., 2011). SGAIS achieves accuracy within 0.1–0.6% of nested sampling/AIS and orders-of-magnitude speedups (Cameron et al., 2019).
6. Extensions, Applications, and Sensitivity Analysis
Recursive marginal likelihood estimators are extensible to:
- Mixed-effects and regression models via nonparametric mixing over random effects or coefficients (Martin et al., 2011).
- Empirical Bayes multiple testing in time series, exploiting mixtures over AR parameters and learning the degree of sparsity (Martin et al., 2011).
- Prior-sensitivity analysis by constructing pseudo-mixture distributions over pooled draws, facilitating importance reweighting and effective sample size monitoring (Cameron et al., 2013).
- Fully Bayesian hybrids, augmenting PRML with curvature-based sampling to propagate uncertainty (Martin et al., 2011).
- Dynamic data assimilation and streaming Bayesian model selection, enabled by SGAIS (Cameron et al., 2019).
Applied examples include finite and infinite normal mixture models for astronomical data, sparse logistic regression with random effects, and high-dimensional statistical audio analysis (Cameron et al., 2013, Bortoli et al., 2019). Sensitivity analyses reveal marked shifts in model selection posteriors under alternative priors, corroborating the need for routine robustness assessment via recursive estimators.
7. Recommendations and Practitioner Guidance
For effective deployment of recursive marginal likelihood estimators:
- Select bridging sequences to minimize divergence and ensure adequate overlap (e.g., power/posterior partial-data paths with tailored spacing parameters).
- Include the prior as a bridging density to stabilize recursion.
- Refine bridging steps where thermodynamic integration via importance sampling indicates rapid integrand variation (Cameron et al., 2013).
- Monitor effective sample size and variance of the estimator; leverage bootstrap or asymptotic covariance formulas for error bars.
- Combine with nested sampling draws when likelihood evaluation is costly, exploiting the ability to reuse samples and reduce estimator variance.
Reporting both model selection metrics and prior-sensitivity results is considered best practice to assess the robustness of inferential conclusions.
References:
- "Semiparametric inference in mixture models with predictive recursion marginal likelihood" (Martin et al., 2011)
- "Stochastic Gradient Annealed Importance Sampling for Efficient Online Marginal Likelihood Estimation" (Cameron et al., 2019)
- "Recursive Pathways to Marginal Likelihood Estimation with Prior-Sensitivity Analysis" (Cameron et al., 2013)
- "Efficient stochastic optimisation by unadjusted Langevin Monte Carlo. Application to maximum marginal likelihood and empirical Bayesian estimation" (Bortoli et al., 2019)