Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
134 tokens/sec
GPT-4o
10 tokens/sec
Gemini 2.5 Pro Pro
47 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Normal-Beta Prime Prior

Updated 1 July 2025
  • The Normal Beta Prime (NBP) prior is a continuous shrinkage prior using a normal scale mixture with a beta prime distribution on the variance component, unifying priors like the horseshoe.
  • It provides tunable hyperparameters for sparsity and tail robustness, achieving theoretical optimality and computational tractability for high-dimensional Bayesian models.
  • Applications include high-dimensional regression for variable selection and regularization, multiple hypothesis testing, and connections to nonparametric Bayesian methods.

The Normal Beta Prime (NBP) prior is a continuous shrinkage prior constructed by mixing a normal distribution with a beta prime distribution on its variance component. This form provides a unifying, highly flexible family of priors for regularization and variable selection in high-dimensional Bayesian models, subsuming well-known priors such as the horseshoe, Strawderman-Berger, and normal-exponential-gamma as special or limiting cases. The NBP prior is characterized by tunable hyperparameters controlling both the degree of sparsity and tail robustness, providing strong theoretical properties and computational tractability crucial for modern large-scale applications.

1. Mathematical Definition and Hierarchical Construction

The NBP prior is formally obtained by representing a parameter of interest θj\theta_j (often a regression coefficient) as a scale mixture of normals: θjτjN(0,τj),τjBeta(a,b,ϕ)\theta_j \mid \tau_j \sim N(0,\, \tau_j), \qquad \tau_j \sim \text{Beta}'(a, b, \phi) where Beta(a,b,ϕ)\text{Beta}'(a, b,\phi) denotes the beta prime (also called inverted beta) distribution with density

π(τj)=Γ(a+b)Γ(a)Γ(b)ϕaτja1(1+τj/ϕ)(a+b),τj>0,  a,b,ϕ>0\pi(\tau_j) = \frac{\Gamma(a+b)}{\Gamma(a)\Gamma(b)} \phi^{-a} \tau_j^{a-1} (1+\tau_j/\phi)^{-(a+b)}, \qquad \tau_j > 0,\; a, b, \phi > 0

For ϕ=1\phi=1, this reduces to the standard beta prime. The NBP prior is thus a normal scale mixture where the mixing distribution on the variance is beta prime.

Equivalently, this prior can be expressed using a three-parameter beta (TPB) distribution (Armagan, Dunson, Clyde 2011): θjρjN(0,1/ρj1),ρjTPB(a,b,ϕ)\theta_j \mid \rho_j \sim \mathcal{N}(0,\, 1/\rho_j - 1), \qquad \rho_j \sim \mathcal{TPB}(a, b, \phi) Since the beta prime is infinitely divisible and closed under convolution, these representations are analytically tractable and admit efficient posterior computations via Gibbs sampling and variational Bayes (1107.4976).

2. Connections to Other Shrinkage Priors

The NBP prior unifies several well-known priors through specific choices of hyperparameters:

  • Horseshoe prior: a=b=1/2a = b = 1/2, ϕ=1\phi=1
  • Strawderman-Berger prior: a=1a=1, b=1/2b=1/2, ϕ=1\phi=1
  • Normal-Exponential-Gamma (NEG): a=1a=1, b>0b > 0
  • Cauchy and Laplace (double exponential): arise as boundary cases

This encompasses both spike-at-zero/strong shrinkage behavior (for small aa) and heavy-tailed robustness (for small bb), overcoming the limitations of traditional heavy-tail or Laplace priors which either lack an infinite spike at zero or cannot robustly accommodate large coefficients (1107.4976).

3. Theoretical Properties and Practical Implications

The core theoretical innovations and practical implications are:

  • Sparsity Inducing: For a0a \to 0, the marginal density at zero for θj\theta_j is infinite, promoting strong shrinkage for small coefficients.
  • Heavy Tail Robustness: Small bb yields heavy tails, reducing overshrinkage for large signals, protecting true nonzero coefficients.
  • Hyperparameter Flexibility: ϕ\phi governs global shrinkage intensity, while aa and bb directly control spike and tail behavior. This enables the prior to be tuned or estimated from data for adaptive regularization (1807.06539).
  • Posterior Contraction: For high-dimensional regression with pnp \gg n and underlying sparsity, careful selection (or adaptive estimation) of aa and bb achieves (near) minimax posterior contraction rates, enabling optimal inference in both sparse and dense settings. The NBP prior is "self-adaptive": empirical Bayes or MML estimation always produce positive (non-degenerate) hyperparameters (1807.06539).

4. Computational Methods and Scalability

The conjugate structure of the NBP prior supports efficient inference:

  • Gibbs Sampling: Conditionals are analytically tractable due to conjugacy, supporting reliable MCMC even for large pp (1107.4976).
  • Variational Bayes (VB): The normal-beta prime mixture admits closed-form coordinate updates, enabling scalable VB approximations with vectorized and parallelized implementations for massive pp (1107.4976).
  • EM and MCEM Algorithms: For empirical Bayes estimation of hyperparameters via marginal likelihood, the EM algorithm exploits the beta prime's gamma/inverse-gamma mixture representations. Monte Carlo EM (MCEM) and mean-field VB EM (MFVB) are available and implemented in the R package NormalBetaPrime. Unlike other global-local priors, hyperparameter estimation for NBP avoids degeneracy to zero (1807.06539).

For grouped regression, recent advances (GRASP (2506.18092)) extend the NBP to both local and grouped shrinkage parameters, enabling hierarchical modeling of sparsity/robustness across and within groups, with hyperparameters tuneable or estimable within a single Gibbs + Metropolis-Hastings framework.

5. Applications in Regression, Multiple Testing, and Nonparametric Bayes

High-dimensional Regression:

  • Empirical results in both synthetic and real datasets (e.g., gene expression) show the NBP prior provides strong variable selection and superior or comparable predictive performance relative to the horseshoe, spike-and-slab, MCP, SCAD, elastic net, and others, adapting to the degree of sparsity present (1807.06539, 2506.18092).
  • Adaptive hyperparameter estimation allows the prior to become more sparse in sparse settings and more diffuse (ridge-like) in dense regimes.

Multiple Hypothesis Testing:

  • In large-scale multiple testing of normal means, thresholding the posterior shrinkage weight under the NBP prior achieves asymptotically optimal Bayes risk (exact "ABOS" property) when the hyperparameter aa tracks the true sparsity. Empirical Bayes, REML, and hierarchical Bayes approaches for aa enable adaptive procedures with the same theoretical guarantees across a full range of sparsity (1807.02421).
  • The NBP prior, via continuous shrinkage, allows testing to be conducted via simple thresholding of posterior quantities without resorting to explicit point-mass or "two-group" spike-and-slab modeling, yet retains oracle-minimax risk.

Nonparametric Bayesian Modeling:

  • The negative binomial process (NBP)—distinct from normal-beta prime but sharing the acronym—has been used to define random discrete probability measures, generalizing the Dirichlet and Poisson-Dirichlet processes. The resulting family can represent a wide class of nonparametric priors, offering increased flexibility in modeling over-dispersed or clustered data by controlling an extra trimming parameter (2307.00176). While not the same as the scale-mixture normal-beta prime prior, this illustrates the wider influence of beta prime-related constructions in Bayesian nonparametrics.

6. Mathematical Properties and Structural Insights

Distributional and Analytical Identities:

  • The beta prime distribution admits identities in law for its convolution and scaling (e.g., sum of independent beta primes involving product or ratio forms). These yield monotonicity and complete-monotonicity (CM/LCM) properties for associated Laplace and hypergeometric transforms, which are important in establishing infinite divisibility and stochastic ordering.
  • The Laplace transform of the NBP prior is tied to the confluent hypergeometric function of the second kind and supports probabilistic proof of inequalities and analytic properties critical in Bayesian inference and shrinkage estimation (2108.09244).

Geometric and Random Matrix Interpretations:

  • Multivariate beta prime (used in NBP priors on Rd\mathbb{R}^d) is rotationally invariant, heavy-tailed, and closed under projections. This property ensures the prior remains in the same class under marginalization and is suitable in high-dimensional and geometric probability contexts (2501.00671).
  • Free probability analogues (free beta prime) relate the NBP prior to spectral limits in random matrix theory and enrich connections to operator-valued and non-commutative priors (1906.00661).

7. Computational Guarantees and MCMC Ergodicity

Recent results prove geometric ergodicity of Gibbs samplers for the normal model with global-local shrinkage priors—including the NBP prior—under conditions solely requiring less restrictive negative moment assumptions on the global scale parameter than previous work (2503.00538). This ensures fast and reliable convergence of MCMC for practically relevant NBP prior parameterizations, broadening the class of priors with theoretically guaranteed computational stability.

Table: Summary of NBP Prior Features in Bayesian Regression

Feature NBP Prior Comparison
Shrinkage at 0 Spike at zero (for a<1/2a<1/2) Laplace: spike but lighter tails
Heavy tails Power-law; bb controls tails Horseshoe: similar for certain a,ba, b
Adaptivity a,ba,b learned from data, full spectrum sparse\leftrightarrowdense Horseshoe: a,ba, b often fixed
Computation Conjugate hierarchy, efficient Gibbs/VB Many non-conjugate alternatives
Minimaxity Near-minimax contraction rates, formal risk optimality Spike-and-slab: only under discrete selection
Self-adaptivity Empirical Bayes tuning direct, non-degenerate Many priors degenerate under EB
Grouping Direct extension to grouped/multilevel shrinkage GIGG: More complex hierarchy needed

References to Key Conceptual Formulas

  • Beta prime density on variance: π(τ)τa1(1+τ)(a+b)\pi(\tau) \propto \tau^{a-1}(1+\tau)^{-(a+b)}
  • Shrinkage mixture: θτN(0,τ)\theta \mid \tau \sim N(0, \tau), τBeta(a,b)\tau \sim \text{Beta}'(a, b)
  • Posterior updates for variational Bayes: (see explicit forms for Gaussian, GIG, Gamma updates in (1107.4976))
  • Empirical Bayes MML update equations: pψ(a)+iE[logλi2]=0-p\psi(a) + \sum_i \mathbb{E}[\log \lambda_i^2] = 0
  • Thresholding rule for multiple testing: E(1κiXi)>1/2\mathbb{E}(1-\kappa_i|X_i) > 1/2

Conclusion

The Normal Beta Prime prior provides a mathematically and computationally robust foundation for a wide range of problems in high-dimensional statistics, including regression, hypothesis testing, and nonparametric inference. Its unified framework for shrinkage, adaptivity, and computational scalability is supported by a comprehensive theoretical and empirical literature, making it a central tool in contemporary Bayesian analysis.