Bayesian Group-Sparse Regression (BGSR)

Updated 22 November 2025

BGSR is a hierarchical probabilistic model that enforces structured sparsity by using group and within-group adaptive priors like spike‐and‐slab and global-local shrinkage.
It employs robust computational methods, including Gibbs samplers, EM, and variational inference, to deliver efficient estimation, screening, and uncertainty quantification.
Applications in genomics, imaging, and signal processing demonstrate BGSR's practical efficacy in support recovery, error minimization, and adaptive model selection.

Bayesian Group-Sparse Regression (BGSR) is a family of hierarchical probabilistic models and computational frameworks designed to impose, adapt to, and quantify uncertainty under structured sparsity in high-dimensional regression. Specifically, BGSR addresses problems where predictors naturally cluster into groups—arising in genomics, imaging, signal processing, and many modern applications—by inducing shrinkage and selection at both the group and within-group levels. BGSR encompasses a wide spectrum of priors (spike-and-slab, global-local shrinkage, NBP, R2D2, projection posteriors), computational backends (MCMC, EM, variational, Expectation Propagation), and methodological objectives (estimation, screening, inference, uncertainty quantification), with strong theoretical support for both adaptability and frequentist optimality.

1. Model Structures: Hierarchies, Priors, and Likelihoods

BGSR frameworks are centered on the grouped linear (or generalized linear, multi-task, nonparametric) regression likelihood: $y \mid X, \beta, \sigma^2 \sim \mathcal N(X\beta, \sigma^2 I_n)$ with predictors partitioned into $G$ groups of sizes $m_1, ..., m_G$ , $\beta = (\beta_1^\top, ..., \beta_G^\top)^\top$ .

Shrinkage and sparsity are induced by hierarchical priors:

Group Spike-and-Slab Priors: Each group $\beta_g$ is associated with an indicator $\gamma_g\in\{0,1\}$ , $\beta_g \mid \gamma_g \sim (1-\gamma_g)\,\delta_0 + \gamma_g\,\Psi(\cdot)$ , where $\Psi$ is a slab (e.g., group-Laplace, Gaussian), and $\gamma_g \sim \mathrm{Bernoulli}(1-\pi_0)$ [$2007.07021$, $1512.01013$].
Continuous Shrinkage Priors: Group-global-local schemes, e.g., NBP/GRASP assigns

$\beta_{gj} \mid \tau^2,\delta_g^2,\lambda_{gj}^2,\sigma^2 \sim \mathcal N(0,\tau^2 \delta_g^2 \lambda_{gj}^2 \sigma^2)$

with half-Cauchy or BetaPrime hyperpriors on scales [$2506.18092$].

Global-Local Shrinkage via Horseshoe or Group R2D2: Hyperpriors on group and within-group scales (e.g., group Dirichlet, logistic-normal for $R_g^2$ as fraction of explained variance) yielding strong adaptability and heavy tails [$1709.04333$, $2412.15293$].
Bi-Level and Multilevel Selection: Extension with nested indicators for within-group sparsity [$1512.01013$, $1809.09367$], used in imaging-genetics and categorical expansions.
Variance/Covariance Hierarchies: For multivariate $Y$ or multi-task, priors on error covariance (inverse-Wishart, eigendecomposition, or task-specific scale mixtures) are imposed [$1807.03439$, $1605.02234$, $2105.10888$].

Underlying these is often an auxiliary-variable (scale-mixture) representation, facilitating efficient posterior computation.

2. Computational Algorithms and Implementation

Inference in BGSR hinges on scalable, theoretically justified algorithms:

Gibbs Samplers: Exploiting conditional conjugacy in Gaussian scale mixtures, Gibbs samplers for group lasso, sparse group lasso, and global-local shrinkage are derived [$1609.04057$, $1709.04333$, $2506.18092$]. Geometric ergodicity has been established for both three-block (sequential variables/scales/noise) and more efficient two-block schemes combining $(\beta,\sigma^2)$ into single updates [$1903.06964$].
EM Algorithms: For spike-and-slab or group-lasso-like penalties, EM algorithms alternate between soft group assignment (E-step: active group probabilities) and penalized regression updates (M-step: weighted group lasso/sparse group lasso) [$2007.07021$, $1903.01979$].
Variational Inference: Mean-field or structured variational schemes (e.g., GSVB) match groupwise inclusion probabilities and means/covariances to the posterior, with blockwise coordinate optimization and empirical Bayes hyperparameter updates [$2309.10378$].
Expectation Propagation: Fast deterministic inference using site-specific moment-matching, designed for hierarchical group/within-group spike-and-slab with Gaussian likelihoods or for applications such as network reconstruction [$1809.09367$].
Sparse Projection-Posterior: Dense posteriors are sparsified via deterministic projection (group lasso, SCAD, adaptive group lasso) yielding induced posteriors with group support control and potentially exact coverage after de-biasing [$2411.15713$].
Selection-informed MCMC: Post-selection Bayesian inference integrates group selection events (e.g., after randomized group lasso) via a Bayesian likelihood adjustment, using Laplace approximations and Langevin/post-selection MCMC for valid credible intervals [$2012.15664$].

Specific implementation guidance includes the use of Cholesky/Woodbury identities for large design matrices, inverse-Gaussian samplers for auxiliary scales, and vectorized computations for group updates.

3. Theoretical Properties: Convergence Rates, Sparsity Recovery, and Uncertainty Quantification

BGSR approaches possess strong nonasymptotic and asymptotic properties:

Posterior Contraction: Minimax rates for estimation and prediction, typically

$\epsilon_n=\sqrt{\frac{s_0 \log G}{n}}$

where $s_0$ is the number of truly nonzero groups, are attainable for both MAP and full posterior under appropriate priors, regularity of the design matrix (restricted eigenvalue, compatibility), and minimal signal assumptions [$2007.07021$, $1903.01979$, $2411.15713$, $2412.15293$].

Model/Support Recovery: Selection consistency (group support recovery) is established under suitable complexity or sparsity assumptions and beta-min conditions, with spike-and-slab priors yielding exact sparsity and global-local priors recoverable via post-processing (DSS, projection) [$1512.01013$, $1709.04333$, $1903.01979$].
Uncertainty Quantification: De-biased estimators and Bernstein-von Mises phenomenon ensure asymptotic normality and valid credible intervals. Coverage is near-nominal or exact for credible regions under de-biasing or BvM theorems [$2411.15713$, $1807.03439$].
Rate-Optimality vs Frequentist Group Lasso: Bayesian MAP/posterior rates do not require the stringent $\sum_g\sqrt{m_g}\|\beta_{0g}\|_2<\infty$ of group lasso, and full posterior contracts optimally unlike group-lasso's overshrinkage, especially under high-dimensionality [$2007.07021$, $1903.01979$].

4. Methodological Innovations: Adaptive Shrinkage, Correlation Control, and Model Extensions

Recent advancements have expanded BGSR's flexibility:

Adaptive Tail and Spike Control: BetaPrime-NBP priors (as in GRASP) allow independent learning of the pole at zero (spike strength) and tail decay (regularization, robustness to large coefficients), interpolating between heavy-tailed, sparse, and ridge-like behaviors [$2506.18092$].
Explicit Correlation Modeling: GRASP quantifies within-group correlation of local shrinkage parameters via shrinkage parameter variances, enabling insight into grouping structure adaptivity [$2506.18092$].
Group R2D2 Prior: Group-level $R^2$ allocation via Dirichlet or logistic-normal decomposition, hierarchical local/global mixing, and theoretical global-local tail robustness [$2412.15293$].
Sparse-Projection Posterior: Posterior-induced sparsification via convex/proximal projection, with extensions to nonparametric additive models, and rigorous (frequentist) credible set coverage via de-biased projections [$2411.15713$].
Post-selection Inference: Bayesian adjustment for selection-induced bias after group lasso selection, applicable to non-polyhedral selection events (norm constraints, overlapping groups), using exact likelihood adjustments and Laplace-type normalizations [$2012.15664$].

5. Practical Applications and Empirical Performance

BGSR has been successfully deployed in diverse high-dimensional settings:

Omics and Imaging Genetics: BGSR multi-task, group-sparse, and hierarchical models enable gene-based association with brain imaging phenotypes, delivering improved credible interval coverage versus nonparametric bootstrap, and revealing interpretable SNP–ROI relationships [$1605.02234$, $1807.03439$, $2105.10888$].
Neural Recovery and Network Inference: Two-level BGSR with deterministic EP or variational inference is competitive in large-scale network recovery, gene regulatory network estimation, and signal estimation tasks [$1809.09367$].
Nonparametric Additive Models: Group-sparse projection posteriors and Bayesian group lasso extensions recover oracle properties in smooth structured additive regression, handle basis expansion and group interactions, and deliver interpretable function selection [$2411.15713$, $2412.15293$].
High-Frequency MIMO and Signal Processing: EM-based BGSR frameworks adapt to structured common-support sparsity in THz multi-user wideband MIMO channel estimation, attaining NMSE and BER close to the Bayesian Cramér-Rao bound—demonstrating remarkable practical efficacy in high-dimensional, multi-measurement regression with group structure [$2511.12102$].

Benchmark results consistently show BGSR methods attaining lower prediction and estimation error, enhanced support recovery, and valid uncertainty quantification compared to penalized frequentist methods and vanilla Bayesian shrinkage models.

6. Limitations, Challenges, and Future Directions

Exact Sparsity and Post-Processing: Most continuous shrinkage priors (horseshoe/NBP/R2D2) yield dense posteriors; variable selection requires explicit thresholding, projection, or DSS to produce sparser solutions [$2506.18092$, $1709.04333$].
Computational Overhead: Full MCMC and scale mixture methods can be costly for very large $p$ , although two-block Gibbs, variational, and projection approaches ameliorate scaling constraints [$1903.06964$, $2309.10378$].
Hyperparameter Sensitivity: Performance of hierarchical shrinkage heavily depends on choice of hyperpriors (especially for shape/tail parameters); half-Cauchy is robust but may require monitoring for mixing [$2506.18092$].
Extensions: Incorporating overlapping, multilevel, or data-driven group structures and adaptation to generalized responses (e.g. GLM, Poisson, logistic, multivariate), as well as Bayesian methods for inference after selection (e.g. randomized or cross-validated group lasso), continue to be active research avenues [$2007.07021$, $2012.15664$].
Uncertainty Calibration for Variational Methods: VB and EP credible intervals are often slightly anti-conservative, motivating the development of de-biased or corrected variants [$2309.10378$].

BGSR remains a vibrant domain unifying sparsity, adaptivity, and credible Bayesian inference in structured high-dimensional regression, with ongoing advances in both theory and scalable algorithms.