Normal-Beta Prime Prior
- The Normal Beta Prime (NBP) prior is a continuous shrinkage prior using a normal scale mixture with a beta prime distribution on the variance component, unifying priors like the horseshoe.
- It provides tunable hyperparameters for sparsity and tail robustness, achieving theoretical optimality and computational tractability for high-dimensional Bayesian models.
- Applications include high-dimensional regression for variable selection and regularization, multiple hypothesis testing, and connections to nonparametric Bayesian methods.
The Normal Beta Prime (NBP) prior is a continuous shrinkage prior constructed by mixing a normal distribution with a beta prime distribution on its variance component. This form provides a unifying, highly flexible family of priors for regularization and variable selection in high-dimensional Bayesian models, subsuming well-known priors such as the horseshoe, Strawderman-Berger, and normal-exponential-gamma as special or limiting cases. The NBP prior is characterized by tunable hyperparameters controlling both the degree of sparsity and tail robustness, providing strong theoretical properties and computational tractability crucial for modern large-scale applications.
1. Mathematical Definition and Hierarchical Construction
The NBP prior is formally obtained by representing a parameter of interest (often a regression coefficient) as a scale mixture of normals: where denotes the beta prime (also called inverted beta) distribution with density
For , this reduces to the standard beta prime. The NBP prior is thus a normal scale mixture where the mixing distribution on the variance is beta prime.
Equivalently, this prior can be expressed using a three-parameter beta (TPB) distribution (Armagan, Dunson, Clyde 2011): Since the beta prime is infinitely divisible and closed under convolution, these representations are analytically tractable and admit efficient posterior computations via Gibbs sampling and variational Bayes (1107.4976).
2. Connections to Other Shrinkage Priors
The NBP prior unifies several well-known priors through specific choices of hyperparameters:
- Horseshoe prior: ,
- Strawderman-Berger prior: , ,
- Normal-Exponential-Gamma (NEG): ,
- Cauchy and Laplace (double exponential): arise as boundary cases
This encompasses both spike-at-zero/strong shrinkage behavior (for small ) and heavy-tailed robustness (for small ), overcoming the limitations of traditional heavy-tail or Laplace priors which either lack an infinite spike at zero or cannot robustly accommodate large coefficients (1107.4976).
3. Theoretical Properties and Practical Implications
The core theoretical innovations and practical implications are:
- Sparsity Inducing: For , the marginal density at zero for is infinite, promoting strong shrinkage for small coefficients.
- Heavy Tail Robustness: Small yields heavy tails, reducing overshrinkage for large signals, protecting true nonzero coefficients.
- Hyperparameter Flexibility: governs global shrinkage intensity, while and directly control spike and tail behavior. This enables the prior to be tuned or estimated from data for adaptive regularization (1807.06539).
- Posterior Contraction: For high-dimensional regression with and underlying sparsity, careful selection (or adaptive estimation) of and achieves (near) minimax posterior contraction rates, enabling optimal inference in both sparse and dense settings. The NBP prior is "self-adaptive": empirical Bayes or MML estimation always produce positive (non-degenerate) hyperparameters (1807.06539).
4. Computational Methods and Scalability
The conjugate structure of the NBP prior supports efficient inference:
- Gibbs Sampling: Conditionals are analytically tractable due to conjugacy, supporting reliable MCMC even for large (1107.4976).
- Variational Bayes (VB): The normal-beta prime mixture admits closed-form coordinate updates, enabling scalable VB approximations with vectorized and parallelized implementations for massive (1107.4976).
- EM and MCEM Algorithms: For empirical Bayes estimation of hyperparameters via marginal likelihood, the EM algorithm exploits the beta prime's gamma/inverse-gamma mixture representations. Monte Carlo EM (MCEM) and mean-field VB EM (MFVB) are available and implemented in the R package
NormalBetaPrime
. Unlike other global-local priors, hyperparameter estimation for NBP avoids degeneracy to zero (1807.06539).
For grouped regression, recent advances (GRASP (2506.18092)) extend the NBP to both local and grouped shrinkage parameters, enabling hierarchical modeling of sparsity/robustness across and within groups, with hyperparameters tuneable or estimable within a single Gibbs + Metropolis-Hastings framework.
5. Applications in Regression, Multiple Testing, and Nonparametric Bayes
High-dimensional Regression:
- Empirical results in both synthetic and real datasets (e.g., gene expression) show the NBP prior provides strong variable selection and superior or comparable predictive performance relative to the horseshoe, spike-and-slab, MCP, SCAD, elastic net, and others, adapting to the degree of sparsity present (1807.06539, 2506.18092).
- Adaptive hyperparameter estimation allows the prior to become more sparse in sparse settings and more diffuse (ridge-like) in dense regimes.
Multiple Hypothesis Testing:
- In large-scale multiple testing of normal means, thresholding the posterior shrinkage weight under the NBP prior achieves asymptotically optimal Bayes risk (exact "ABOS" property) when the hyperparameter tracks the true sparsity. Empirical Bayes, REML, and hierarchical Bayes approaches for enable adaptive procedures with the same theoretical guarantees across a full range of sparsity (1807.02421).
- The NBP prior, via continuous shrinkage, allows testing to be conducted via simple thresholding of posterior quantities without resorting to explicit point-mass or "two-group" spike-and-slab modeling, yet retains oracle-minimax risk.
Nonparametric Bayesian Modeling:
- The negative binomial process (NBP)—distinct from normal-beta prime but sharing the acronym—has been used to define random discrete probability measures, generalizing the Dirichlet and Poisson-Dirichlet processes. The resulting family can represent a wide class of nonparametric priors, offering increased flexibility in modeling over-dispersed or clustered data by controlling an extra trimming parameter (2307.00176). While not the same as the scale-mixture normal-beta prime prior, this illustrates the wider influence of beta prime-related constructions in Bayesian nonparametrics.
6. Mathematical Properties and Structural Insights
Distributional and Analytical Identities:
- The beta prime distribution admits identities in law for its convolution and scaling (e.g., sum of independent beta primes involving product or ratio forms). These yield monotonicity and complete-monotonicity (CM/LCM) properties for associated Laplace and hypergeometric transforms, which are important in establishing infinite divisibility and stochastic ordering.
- The Laplace transform of the NBP prior is tied to the confluent hypergeometric function of the second kind and supports probabilistic proof of inequalities and analytic properties critical in Bayesian inference and shrinkage estimation (2108.09244).
Geometric and Random Matrix Interpretations:
- Multivariate beta prime (used in NBP priors on ) is rotationally invariant, heavy-tailed, and closed under projections. This property ensures the prior remains in the same class under marginalization and is suitable in high-dimensional and geometric probability contexts (2501.00671).
- Free probability analogues (free beta prime) relate the NBP prior to spectral limits in random matrix theory and enrich connections to operator-valued and non-commutative priors (1906.00661).
7. Computational Guarantees and MCMC Ergodicity
Recent results prove geometric ergodicity of Gibbs samplers for the normal model with global-local shrinkage priors—including the NBP prior—under conditions solely requiring less restrictive negative moment assumptions on the global scale parameter than previous work (2503.00538). This ensures fast and reliable convergence of MCMC for practically relevant NBP prior parameterizations, broadening the class of priors with theoretically guaranteed computational stability.
Table: Summary of NBP Prior Features in Bayesian Regression
Feature | NBP Prior | Comparison |
---|---|---|
Shrinkage at 0 | Spike at zero (for ) | Laplace: spike but lighter tails |
Heavy tails | Power-law; controls tails | Horseshoe: similar for certain |
Adaptivity | learned from data, full spectrum sparsedense | Horseshoe: often fixed |
Computation | Conjugate hierarchy, efficient Gibbs/VB | Many non-conjugate alternatives |
Minimaxity | Near-minimax contraction rates, formal risk optimality | Spike-and-slab: only under discrete selection |
Self-adaptivity | Empirical Bayes tuning direct, non-degenerate | Many priors degenerate under EB |
Grouping | Direct extension to grouped/multilevel shrinkage | GIGG: More complex hierarchy needed |
References to Key Conceptual Formulas
- Beta prime density on variance:
- Shrinkage mixture: ,
- Posterior updates for variational Bayes: (see explicit forms for Gaussian, GIG, Gamma updates in (1107.4976))
- Empirical Bayes MML update equations:
- Thresholding rule for multiple testing:
Conclusion
The Normal Beta Prime prior provides a mathematically and computationally robust foundation for a wide range of problems in high-dimensional statistics, including regression, hypothesis testing, and nonparametric inference. Its unified framework for shrinkage, adaptivity, and computational scalability is supported by a comprehensive theoretical and empirical literature, making it a central tool in contemporary Bayesian analysis.