Bayesian Group Global-Local Shrinkage Prior

Updated 20 November 2025

Bayesian group global-local shrinkage prior is a flexible hierarchical model that uses group-specific local scales and a global parameter for adaptive variable selection in high dimensions.
It employs a polynomial-tailed modification to strongly shrink noise while retaining large signals, ensuring robust group selection and optimal estimation.
The approach features an efficient half-thresholding rule that outperforms traditional spike-and-slab and group LASSO methods in both empirical and theoretical studies.

The Bayesian group global-local shrinkage prior is a flexible class of hierarchical priors developed to address high-dimensional variable selection and estimation problems where covariates or coefficients are structured in groups. Building on the success of continuous global-local shrinkage approaches such as the horseshoe, these priors enable simultaneous adaptation to group-level sparsity and signal strength by assigning each group a local scale parameter that interacts multiplicatively with a global shrinkage parameter. This construction yields strong shrinkage for groups with negligible effects, while preserving estimation accuracy for groups that contain genuine signal, and admits polynomial tails for detection of large effects. A salient variant of this framework uses a “modified global-local” structure that induces polynomially decaying tails for the group coefficients, optimally balancing the need to shrink noise while retaining prominence for large signals. Theoretical and empirical investigations document favorable selection and estimation properties, with performance rivaling or exceeding canonical two-group spike-and-slab approaches in group selection under high-dimensional scaling (Paul et al., 2023).

1. Hierarchical Model Formulation

Let $y \in \mathbb{R}^n$ be the response and $X = [X_1, \dots, X_G]$ the design matrix concatenated from $G$ groups ( $X_g$ of size $n \times m_g$ ). The target coefficients are partitioned as $\beta = (\beta_1, \ldots, \beta_G)$ with $\beta_g \in \mathbb{R}^{m_g}$ , $\sum_g m_g = p$ . The Gaussian linear model is

$y \mid X, \beta, \sigma^2 \sim N(X\beta, \sigma^2 I_n).$

The group global-local shrinkage prior adopts the following form (‘global-local g-prior’): $\begin{aligned} \beta_g \mid \lambda_g, \tau, \sigma^2 &\sim N_{m_g}\left(0,\, \sigma^2 \tau^2 \lambda_g^2 (X_g^\top X_g)^{-1}\right), \ \lambda_g^2 &\sim \pi(\lambda_g^2), \quad \text{where} \ \pi(\lambda_g^2) \propto (\lambda_g^2)^{-a-1} L(\lambda_g^2), \ a>0, \end{aligned}$ with $L$ slowly varying. The global scale $\tau$ is either set as a tuning parameter when the group sparsity level is known, or assigned a prior (full or empirical Bayes estimation) such as a truncated half-Cauchy. The variance $\sigma^2$ is given a Jeffreys’ prior ( $\propto 1/\sigma^2$ ) in practice or sometimes fixed for theoretical analysis.

The joint prior density is thus explicitly

$\pi(\beta_g \mid \lambda_g, \tau, \sigma^2) = |2\pi \sigma^2 \tau^2 \lambda_g^2 (X_g^\top X_g)^{-1}|^{-1/2} \exp\left\{-\frac{1}{2} \beta_g' (X_g^\top X_g) \beta_g / (\sigma^2 \tau^2 \lambda_g^2)\right\}.$

2. Polynomial-tailed Modification and Tail Properties

The critical feature distinguishing the Bayesian group global-local shrinkage prior is its polynomial-tailed structure on the group coefficient vector. Specifically, the local scales $\lambda_g^2$ are described by

$\pi(\lambda_g^2) \propto (\lambda_g^2)^{-a-1} L(\lambda_g^2), \ a > 0,$

with $L$ Karamata slowly varying. The resulting marginal prior on $\beta_g$ is

$p(\beta_g) \propto |X_g^\top X_g|^{1/2} \left[\beta_g^\top X_g^\top X_g \beta_g\right]^{-(a + m_g/2)} L\left(\beta_g^\top X_g^\top X_g \beta_g\right),$

inducing heavy (polynomial) tails. The exponent $a$ directly modulates the tail decay: small $a$ yields heavier tails, which encourages concentration of mass at zero, but does not overly penalize large signals. Special cases include the horseshoe prior and other 'one-group' polynomial-tailed forms as in Tang et al. (2018) (Paul et al., 2023).

3. Selection via the Half-Thresholding Rule

A distinguishing feature is the explicit computationally tractable selection rule. For a block-orthogonal design ( $X_g^\top X_h = 0$ for $g \ne h$ ), the posterior mean of each group factors as

$\mathbb{E}(\beta_g \mid D, \tau, \sigma^2) = (1 - \mathbb{E}[K_g \mid D, \tau, \sigma^2]) \hat{\beta}_g^{OLS}, \quad K_g = \frac{1}{1 + \tau^2 \lambda_g^2}.$

Let $s_g = \mathbb{E}(1 - K_g \mid D)$ denote the shrinkage factor. The half-thresholding rule declares group $g$ active if

$s_g > \frac{1}{2} \quad \Longleftrightarrow \quad \frac{\|\mathbb{E}(\beta_g \mid D)\|_2}{\|\hat{\beta}_g^{OLS}\|_2} > \frac{1}{2}.$

This threshold rule is fully specified by the posterior mean, requiring no marginal likelihood computation or combinatorial search, and is adaptive to signal strength and group size (Paul et al., 2023).

4. Global Scale ( $\tau$ ) Selection Strategies

The choice of global shrinkage parameter $\tau$ is pivotal for controlling the trade-off between bias and variance:

Known sparsity: If the proportion of active groups, $\pi = G_A/G$ , is known, a near-optimal choice is $\tau_n = (G_A/G)^{2+\delta}$ for small $\delta>0$ .
Empirical Bayes: When sparsity is unknown, an empirical Bayes estimator (after van der Pas et al.) is used:

$\hat{\tau}_{EB} = \max\left\{G^{-1},\, \frac{1}{c_2 G} \sum_{g=1}^G 1\left\{n \hat{\beta}_g' Q_{n,g} \hat{\beta}_g / \sigma^2 > c_1 \ln G\right\}\right\},$

with $Q_{n,g} = X_g^\top X_g / n$ , $c_1 \geq 2$ , $c_2 \geq 1$ .

Full Bayes: A truncated half-Cauchy prior, $\pi(\tau) = \textrm{C}^+(0,1)$ on $[G^{-1-\delta}, G^{-1-\delta}\ln G]$ , ensures that $\tau$ concentrates in the oracle regime.

Adaptation to unknown sparsity is thus achieved without combinatorial model enumeration (Paul et al., 2023).

5. Theoretical Guarantees

Let $A = \{g: \beta_g^0 \ne 0\}$ , $|A| = G_A$ , and total number of groups $G$ with $G_A = o(G)$ .

Variable selection consistency: Under standard regularity (group designs with bounded eigenvalues, signals not vanishing, bounded group size), and suitable choices of $\tau_n$ (e.g., $G \tau_n^2 [\ln(1/\tau_n)]^{-1} \to 0$ ), the half-thresholding rule is selection consistent:

$\mathbb{P}\left( \widehat{A}_{HT} = A \right) \to 1 \quad (n \to \infty).$

Oracle estimation rates: For any unit vector $a$ with support in $A$ and under further eigenvalue and signal bounds, the estimator achieves asymptotic normality at the minimax-optimal rate:

$a'(\hat{\beta}_A - \beta_A^0) \stackrel{d}{\to} N(0, \sigma^2).$

These properties extend to empirical Bayes and full Bayes strategies, requiring only mild technical modifications for $a \in (0,1)$ or alternative empirical $\tau$ selection (Paul et al., 2023).

6. Empirical Performance and Method Comparisons

Extensive simulations were conducted across nine regimes (varying $n/p$ , signal strength, group sizes, orthogonality of design). Principal comparators include:

Modified Group Horseshoe (MGH) and Group Horseshoe (GH)
Empirical Bayes MGH-EB1/EB2, Full Bayes MGH-FB
Two-group spike-&-slab (GSD-SSS, BGL-SS), Group LASSO

Performance metrics: Misclassification Probability (MP), False Positive Rate (FPR), True Positive Rate (TPR).

Findings:

MGH (and GH) priors yield the lowest MP and FPR and highest TPR, especially under weak or moderate signal regimes and smaller $n$ .
Empirical Bayes and Full Bayes variants match nearly the oracle-tuned half-thresholding rule.
Two-group priors (GSD-SSS, BGL-SS) require stronger signal or larger $n$ to achieve similar performance.
Group LASSO tends to overselection (high FPR) except under strong signals or large $n$ .

This demonstrates that one-group, polynomial-tailed global-local priors with the half-thresholding rule match or outperform classical two-group spike-and-slab or penalized likelihood group selection methods while simultaneously offering substantial computational and inferential simplicity (Paul et al., 2023).

7. Broader Context and Extensions

The group global-local paradigm extends naturally to multilevel and network-structured problems (e.g., multivariate responses (Kundu et al., 2019), multilevel models with joint control via Dirichlet or Beta-P distributions (Aguilar et al., 2022), gene network estimation (Leday et al., 2015), and network-based classification (Guha et al., 2020)). Each variant tailors the local scales to correspond to natural groupings and adapts the thresholding or selection scheme appropriately. Notably, the polynomial-tailed forms enable robust signal recovery in ultra-high-dimensional or weak-signal settings and facilitate practical model selection via continuous shrinkage without the need for discrete model search. In summary, the Bayesian group global-local shrinkage prior furnishes a unified, theoretically rigorous, and empirically validated approach to sparse estimation and group selection across a broad array of high-dimensional settings.