Papers
Topics
Authors
Recent
Search
2000 character limit reached

Empirical Bayes via Predictive Recursion

Updated 26 February 2026
  • The paper introduces a predictive recursion algorithm to nonparametrically estimate mixing distributions, yielding fast and consistent empirical Bayes rules.
  • It outlines a semiparametric framework where marginal likelihood maximization efficiently estimates structural parameters without requiring computationally intensive MCMC sampling.
  • Applications include large-scale multiple testing in hierarchical models, achieving improved tail-fitting, enhanced power, and realistic control over false discovery rates.

Nonparametric Empirical Bayes via Predictive Recursion refers to a methodology for empirical Bayes inference in mixture models in which the mixing distribution is estimated nonparametrically using the predictive recursion (PR) algorithm. This approach enables semiparametric or fully nonparametric estimation of the prior in hierarchical models, yielding fast and consistent empirical Bayes rules for large-scale inference, such as multiple testing, without resorting to computationally intensive Markov chain Monte Carlo. The core innovation is the PR algorithm's ability to efficiently learn an unknown mixing distribution, often in concert with maximum marginal likelihood estimation of parametric or structural components.

1. Hierarchical Models and the Empirical Bayes Problem

In empirical Bayes (EB) settings, the data {Xi}i=1n\{X_i\}_{i=1}^n are modeled as conditionally independent given latent parameters {θi}\{\theta_i\}, each drawn i.i.d. from an unknown prior FF. The observation-level model is XiK(θi,ϑ)X_i \sim K(\cdot|\theta_i, \vartheta), with KK a known kernel and ϑ\vartheta a finite-dimensional nuisance or structural parameter. The marginal likelihood is a mixture:

f(x;ϑ)=K(xθ,ϑ)dF(θ)f(x; \vartheta) = \int K(x | \theta, \vartheta) \, dF(\theta)

The empirical Bayes objective is to estimate FF (nonparametrically) and ϑ\vartheta (typically by marginal likelihood), then use the fitted model to form data-driven decisions or estimates (e.g., via plug-in Bayes rules) (Martin et al., 2011, Martin, 2012).

A canonical application is the two-groups model for large-scale multiple testing, where ZiZ_i are test statistics assumed arise from a mixture:

f(z)=πf0(z)+(1π)f1(z)f(z) = \pi f_0(z) + (1-\pi) f_1(z)

with f0f_0 representing the null distribution (often an empirical null with unknown mean/variance), and f1f_1 a nonparametric alternative, typically modeled as a continuous mixture of the null kernel shifted or stretched (Martin et al., 2011).

2. The Predictive Recursion Algorithm

Predictive recursion (PR), introduced for nonparametric mixture estimation, produces a sequence of estimates for the unknown mixing density p(u)p(u) (or, more generally, mixing measure Ψ\Psi) by recursively updating its previous value in light of each new data point. For observations X1,...,XnX_1, ..., X_n and structural parameter θ\theta fixed, the PR update is:

mi1,θ(x)=K(xu,θ)pi1(u)du pi(u)=(1wi)pi1(u)+wiK(Xiu,θ)pi1(u)mi1,θ(Xi)\begin{aligned} m_{i-1,\theta}(x) &= \int K(x|u,\theta) \, p_{i-1}(u)\, du \ p_i(u) &= (1-w_i) p_{i-1}(u) + w_i \frac{K(X_i|u,\theta) p_{i-1}(u)}{m_{i-1,\theta}(X_i)} \end{aligned}

with an initial guess p0(u)p_0(u) and a weight sequence {wi}\{w_i\} satisfying wi=\sum w_i = \infty, wi2<\sum w_i^2 < \infty (e.g., wi=(i+1)γ,γ(2/3,1]w_i = (i+1)^{-\gamma}, \gamma \in (2/3, 1]). After nn steps, pn(u)p_n(u) is the final estimate (Martin et al., 2011, Martin, 2012, Martin et al., 2011).

Key features include:

  • Online recursion: One pass through the data, updating with each XiX_i.
  • No MCMC required: Orders of magnitude faster than nonparametric Bayes via Dirichlet process mixtures.
  • Dependence on order: PR is not permutation invariant, so averaging over a small number of random permutations is advisable.

3. Predictive Recursion Marginal Likelihood and Parameter Estimation

In semiparametric settings, PR forms the basis for estimating finite-dimensional structural parameters by constructing a marginal likelihood. For parameter θ\theta and data {Xi}\{X_i\}, the PR-marginal likelihood is

LPR(θ)=i=1nmi1,θ(Xi)L_{\mathrm{PR}}(\theta) = \prod_{i=1}^n m_{i-1,\theta}(X_i)

where mi1,θ(Xi)m_{i-1,\theta}(X_i) is the current mixture density at XiX_i given θ\theta and past estimates pi1p_{i-1}.

Rather than a plug-in profile likelihood (which can be unstable), maximizing LPRL_{\mathrm{PR}} provides a computationally efficient, consistent estimator for θ\theta. Regularization is typically introduced by adding a (weakly informative) prior on θ\theta, penalizing implausible parameter values. Standard optimization routines (e.g., BFGS, quasi-Newton) are used for argmaxθLPR(θ)\arg\max_\theta L_{\mathrm{PR}}(\theta) (Martin et al., 2011, Martin et al., 2011).

The algorithm is summarized as:

  1. For each candidate θ\theta, run PR over {Xi}\{X_i\} to obtain LPR(θ)L_{\mathrm{PR}}(\theta).
  2. Maximize LPR(θ)L_{\mathrm{PR}}(\theta) (plus log-prior, if regularized) to obtain θ^\hat\theta.
  3. Run PR again at θ^\hat\theta to obtain the final mixing density estimate.

4. Application to Large-Scale Multiple Testing: The PRtest Procedure

In large-scale multiple testing, PR enables a fully nonparametric EB version of the two-groups model with empirical null. The null density f0(z)f_0(z) is typically N(zμ,σ2)N(z|\mu,\sigma^2) with both mean and variance estimated from data; f1(z)f_1(z) is constructed as a nonparametric location mixture:

f1(z)=u[1,1]N(zμ+τσu,σ2)ψ(u)duf_1(z) = \int_{u\in[-1,1]} N\big(z \mid \mu+\tau\sigma u,\, \sigma^2\big) \psi(u) du

where ψ\psi is an unknown density and τ1\tau \ge 1 is a stretching parameter for the alternative (Martin et al., 2011).

The specific steps for PRtest are:

  • Fit the mixture model via PR and marginal likelihood maximization, simultaneously estimating (μ,σ,τ,π0)(\mu, \sigma, \tau, \pi_0) and ψ(u)\psi(u), using weakly-informative priors for regularization.
  • Compute the estimated marginal density f^(z)=π^N(zμ^,σ^2)+(1π^)f^1(z)\hat f(z) = \hat\pi N(z|\hat\mu, \hat\sigma^2) + (1-\hat\pi) \hat f_1(z).
  • Estimate the local false discovery rate (lFDR):

lFDR^(z)=π^N(zμ^,σ^2)f^(z)\widehat{\mathrm{lFDR}}(z) = \frac{\hat\pi N(z|\hat\mu, \hat\sigma^2)}{\hat f(z)}

  • Apply the decision rule: declare ZiZ_i non-null if lFDR^(Zi)<r\widehat{\mathrm{lFDR}}(Z_i)<r for a threshold rr (typically $0.1$).

This procedure provides both better null estimation and improved tail-fit for f1f_1, enabling higher power and more realistic control over empirical FDR, validated in simulation and on microarray data (Martin et al., 2011).

5. Theoretical Properties: Consistency, Identifiability, and Asymptotic Optimality

PR admits strong theoretical guarantees:

  • Consistency: Under mild conditions (weak compactness of candidate prior family, continuity of kernels, weight decay), the estimated mixing distribution FnF_n converges weakly to the oracle FF as nn\to\infty, and the mixture density converges in Kullback-Leibler divergence (Martin, 2012, Martin et al., 2011).
  • Identifiability: For the two-groups model with τ>1\tau > 1 and smooth kernels, the mapping (μ,σ,π,τ,ψ)πf0+(1π)f1(\mu, \sigma, \pi, \tau, \psi) \mapsto \pi f_0 + (1-\pi) f_1 is injective (Martin et al., 2011).
  • Asymptotic optimality of EB rules: Data-driven plug-in EB rules using FnF_n (for hypothesis testing, estimation under square loss, etc.) are asymptotically Bayes optimal, i.e., their frequentist risk converges to the Bayes oracle risk under the true prior (Martin, 2012). Sufficient conditions for this optimality include boundedness and dominated convergence for losses and convergence of FnF_n.

6. Computational Complexity and Practical Implementation

The PR algorithm is computationally efficient:

  • Complexity: One pass of PR over nn samples for a mixing density represented on an mm-point grid requires O(nm)O(n m) operations.
  • No MCMC: Unlike Dirichlet process mixture approaches, there is no need to store or sample cluster allocations. The marginal likelihood is available as a byproduct of the main recursion.
  • Order dependence: While PR is order dependent, permutation averaging over a modest number of random orderings nearly eliminates this effect.
  • Parameter tuning: Decay rate γ\gamma (for wiw_i) and initial guess for the mixing density should be chosen judiciously; in practice, γ2/3\gamma\approx 2/3 or $1/(n+1)$ for weights provides reliable convergence. Grid choice in uu is a tradeoff between computational cost and resolution.

7. Empirical Evidence and Comparative Performance

Extensive simulation and real-data analysis indicate that PR-based nonparametric EB methods (e.g., PRtest) achieve:

  • Accurate estimation of the mixing proportion π\pi over a wide range of sparsity levels, often outperforming both parametric EB and alternative nonparametric methods (e.g., Jin–Cai, mixfdr) in estimation and tail fit.
  • Empirical FDR and Bayes risk curves closely matched to Bayes oracle procedures across realistic data-generating scenarios.
  • Competitive power and false non-discovery rates: In the "golden spike" microarray benchmark, PRtest closely matched the oracle in number of true discoveries, unlike methods that grossly under- or overestimate signal counts.
  • Improved tail-fitting and signal detection in real microarray datasets, correctly flagging significant genes missed by approaches that overshrink the estimated null component.

These results demonstrate that predictive recursion, when coupled with marginal likelihood optimization and nonparametric mixing, delivers empirical Bayes procedures for large-scale inference that are both theoretically grounded and practically competitive (Martin et al., 2011, Martin, 2012, Martin et al., 2011).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Nonparametric Empirical Bayes via Predictive Recursion.