Papers
Topics
Authors
Recent
2000 character limit reached

Bayesian Group Global-Local Shrinkage Prior

Updated 20 November 2025
  • Bayesian group global-local shrinkage prior is a flexible hierarchical model that uses group-specific local scales and a global parameter for adaptive variable selection in high dimensions.
  • It employs a polynomial-tailed modification to strongly shrink noise while retaining large signals, ensuring robust group selection and optimal estimation.
  • The approach features an efficient half-thresholding rule that outperforms traditional spike-and-slab and group LASSO methods in both empirical and theoretical studies.

The Bayesian group global-local shrinkage prior is a flexible class of hierarchical priors developed to address high-dimensional variable selection and estimation problems where covariates or coefficients are structured in groups. Building on the success of continuous global-local shrinkage approaches such as the horseshoe, these priors enable simultaneous adaptation to group-level sparsity and signal strength by assigning each group a local scale parameter that interacts multiplicatively with a global shrinkage parameter. This construction yields strong shrinkage for groups with negligible effects, while preserving estimation accuracy for groups that contain genuine signal, and admits polynomial tails for detection of large effects. A salient variant of this framework uses a “modified global-local” structure that induces polynomially decaying tails for the group coefficients, optimally balancing the need to shrink noise while retaining prominence for large signals. Theoretical and empirical investigations document favorable selection and estimation properties, with performance rivaling or exceeding canonical two-group spike-and-slab approaches in group selection under high-dimensional scaling (Paul et al., 2023).

1. Hierarchical Model Formulation

Let yRny \in \mathbb{R}^n be the response and X=[X1,,XG]X = [X_1, \dots, X_G] the design matrix concatenated from GG groups (XgX_g of size n×mgn \times m_g). The target coefficients are partitioned as β=(β1,,βG)\beta = (\beta_1, \ldots, \beta_G) with βgRmg\beta_g \in \mathbb{R}^{m_g}, gmg=p\sum_g m_g = p. The Gaussian linear model is

yX,β,σ2N(Xβ,σ2In).y \mid X, \beta, \sigma^2 \sim N(X\beta, \sigma^2 I_n).

The group global-local shrinkage prior adopts the following form (‘global-local g-prior’): βgλg,τ,σ2Nmg(0,σ2τ2λg2(XgXg)1), λg2π(λg2),where π(λg2)(λg2)a1L(λg2), a>0,\begin{aligned} \beta_g \mid \lambda_g, \tau, \sigma^2 &\sim N_{m_g}\left(0,\, \sigma^2 \tau^2 \lambda_g^2 (X_g^\top X_g)^{-1}\right), \ \lambda_g^2 &\sim \pi(\lambda_g^2), \quad \text{where} \ \pi(\lambda_g^2) \propto (\lambda_g^2)^{-a-1} L(\lambda_g^2), \ a>0, \end{aligned} with LL slowly varying. The global scale τ\tau is either set as a tuning parameter when the group sparsity level is known, or assigned a prior (full or empirical Bayes estimation) such as a truncated half-Cauchy. The variance σ2\sigma^2 is given a Jeffreys’ prior (1/σ2\propto 1/\sigma^2) in practice or sometimes fixed for theoretical analysis.

The joint prior density is thus explicitly

π(βgλg,τ,σ2)=2πσ2τ2λg2(XgXg)11/2exp{12βg(XgXg)βg/(σ2τ2λg2)}.\pi(\beta_g \mid \lambda_g, \tau, \sigma^2) = |2\pi \sigma^2 \tau^2 \lambda_g^2 (X_g^\top X_g)^{-1}|^{-1/2} \exp\left\{-\frac{1}{2} \beta_g' (X_g^\top X_g) \beta_g / (\sigma^2 \tau^2 \lambda_g^2)\right\}.

2. Polynomial-tailed Modification and Tail Properties

The critical feature distinguishing the Bayesian group global-local shrinkage prior is its polynomial-tailed structure on the group coefficient vector. Specifically, the local scales λg2\lambda_g^2 are described by

π(λg2)(λg2)a1L(λg2), a>0,\pi(\lambda_g^2) \propto (\lambda_g^2)^{-a-1} L(\lambda_g^2), \ a > 0,

with LL Karamata slowly varying. The resulting marginal prior on βg\beta_g is

p(βg)XgXg1/2[βgXgXgβg](a+mg/2)L(βgXgXgβg),p(\beta_g) \propto |X_g^\top X_g|^{1/2} \left[\beta_g^\top X_g^\top X_g \beta_g\right]^{-(a + m_g/2)} L\left(\beta_g^\top X_g^\top X_g \beta_g\right),

inducing heavy (polynomial) tails. The exponent aa directly modulates the tail decay: small aa yields heavier tails, which encourages concentration of mass at zero, but does not overly penalize large signals. Special cases include the horseshoe prior and other 'one-group' polynomial-tailed forms as in Tang et al. (2018) (Paul et al., 2023).

3. Selection via the Half-Thresholding Rule

A distinguishing feature is the explicit computationally tractable selection rule. For a block-orthogonal design (XgXh=0X_g^\top X_h = 0 for ghg \ne h), the posterior mean of each group factors as

E(βgD,τ,σ2)=(1E[KgD,τ,σ2])β^gOLS,Kg=11+τ2λg2.\mathbb{E}(\beta_g \mid D, \tau, \sigma^2) = (1 - \mathbb{E}[K_g \mid D, \tau, \sigma^2]) \hat{\beta}_g^{OLS}, \quad K_g = \frac{1}{1 + \tau^2 \lambda_g^2}.

Let sg=E(1KgD)s_g = \mathbb{E}(1 - K_g \mid D) denote the shrinkage factor. The half-thresholding rule declares group gg active if

sg>12E(βgD)2β^gOLS2>12.s_g > \frac{1}{2} \quad \Longleftrightarrow \quad \frac{\|\mathbb{E}(\beta_g \mid D)\|_2}{\|\hat{\beta}_g^{OLS}\|_2} > \frac{1}{2}.

This threshold rule is fully specified by the posterior mean, requiring no marginal likelihood computation or combinatorial search, and is adaptive to signal strength and group size (Paul et al., 2023).

4. Global Scale (τ\tau) Selection Strategies

The choice of global shrinkage parameter τ\tau is pivotal for controlling the trade-off between bias and variance:

  • Known sparsity: If the proportion of active groups, π=GA/G\pi = G_A/G, is known, a near-optimal choice is τn=(GA/G)2+δ\tau_n = (G_A/G)^{2+\delta} for small δ>0\delta>0.
  • Empirical Bayes: When sparsity is unknown, an empirical Bayes estimator (after van der Pas et al.) is used:

τ^EB=max{G1,1c2Gg=1G1{nβ^gQn,gβ^g/σ2>c1lnG}},\hat{\tau}_{EB} = \max\left\{G^{-1},\, \frac{1}{c_2 G} \sum_{g=1}^G 1\left\{n \hat{\beta}_g' Q_{n,g} \hat{\beta}_g / \sigma^2 > c_1 \ln G\right\}\right\},

with Qn,g=XgXg/nQ_{n,g} = X_g^\top X_g / n, c12c_1 \geq 2, c21c_2 \geq 1.

  • Full Bayes: A truncated half-Cauchy prior, π(τ)=C+(0,1)\pi(\tau) = \textrm{C}^+(0,1) on [G1δ,G1δlnG][G^{-1-\delta}, G^{-1-\delta}\ln G], ensures that τ\tau concentrates in the oracle regime.

Adaptation to unknown sparsity is thus achieved without combinatorial model enumeration (Paul et al., 2023).

5. Theoretical Guarantees

Let A={g:βg00}A = \{g: \beta_g^0 \ne 0\}, A=GA|A| = G_A, and total number of groups GG with GA=o(G)G_A = o(G).

  • Variable selection consistency: Under standard regularity (group designs with bounded eigenvalues, signals not vanishing, bounded group size), and suitable choices of τn\tau_n (e.g., Gτn2[ln(1/τn)]10G \tau_n^2 [\ln(1/\tau_n)]^{-1} \to 0), the half-thresholding rule is selection consistent:

P(A^HT=A)1(n).\mathbb{P}\left( \widehat{A}_{HT} = A \right) \to 1 \quad (n \to \infty).

  • Oracle estimation rates: For any unit vector aa with support in AA and under further eigenvalue and signal bounds, the estimator achieves asymptotic normality at the minimax-optimal rate:

a(β^AβA0)dN(0,σ2).a'(\hat{\beta}_A - \beta_A^0) \stackrel{d}{\to} N(0, \sigma^2).

  • These properties extend to empirical Bayes and full Bayes strategies, requiring only mild technical modifications for a(0,1)a \in (0,1) or alternative empirical τ\tau selection (Paul et al., 2023).

6. Empirical Performance and Method Comparisons

Extensive simulations were conducted across nine regimes (varying n/pn/p, signal strength, group sizes, orthogonality of design). Principal comparators include:

  • Modified Group Horseshoe (MGH) and Group Horseshoe (GH)
  • Empirical Bayes MGH-EB1/EB2, Full Bayes MGH-FB
  • Two-group spike-&-slab (GSD-SSS, BGL-SS), Group LASSO

Performance metrics: Misclassification Probability (MP), False Positive Rate (FPR), True Positive Rate (TPR).

Findings:

  • MGH (and GH) priors yield the lowest MP and FPR and highest TPR, especially under weak or moderate signal regimes and smaller nn.
  • Empirical Bayes and Full Bayes variants match nearly the oracle-tuned half-thresholding rule.
  • Two-group priors (GSD-SSS, BGL-SS) require stronger signal or larger nn to achieve similar performance.
  • Group LASSO tends to overselection (high FPR) except under strong signals or large nn.

This demonstrates that one-group, polynomial-tailed global-local priors with the half-thresholding rule match or outperform classical two-group spike-and-slab or penalized likelihood group selection methods while simultaneously offering substantial computational and inferential simplicity (Paul et al., 2023).

7. Broader Context and Extensions

The group global-local paradigm extends naturally to multilevel and network-structured problems (e.g., multivariate responses (Kundu et al., 2019), multilevel models with joint control via Dirichlet or Beta-P distributions (Aguilar et al., 2022), gene network estimation (Leday et al., 2015), and network-based classification (Guha et al., 2020)). Each variant tailors the local scales to correspond to natural groupings and adapts the thresholding or selection scheme appropriately. Notably, the polynomial-tailed forms enable robust signal recovery in ultra-high-dimensional or weak-signal settings and facilitate practical model selection via continuous shrinkage without the need for discrete model search. In summary, the Bayesian group global-local shrinkage prior furnishes a unified, theoretically rigorous, and empirically validated approach to sparse estimation and group selection across a broad array of high-dimensional settings.

Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Bayesian Group Global-Local Shrinkage Prior.