Bayesian Global-Local Shrinkage with Univariate Guidance for Ultra-High-Dimensional Regression

Published 3 Apr 2026 in stat.ME | (2604.04964v1)

Abstract: We propose Bayesian Univariate-Guided Sparse Regression (BUGS), a novel global-local shrinkage framework that incorporates marginal association information directly into the prior through a continuous modulation of shrinkage. Unlike existing approaches that treat predictors symmetrically or rely on post hoc screening, BUGS embeds univariate guidance within the nonlinear variance structure of a regularized horseshoe prior, inducing adaptive shrinkage that enhances signal-noise separation. We establish theoretical guarantees including prior concentration, posterior contraction, and guidance-induced shrinkage separation, while demonstrating robustness under uninformative guidance. To enable scalability in ultra-high dimensions, we develop BUGS-Active, an active-set MCMC approximation that restricts local updates to a data-adaptive subset A_n, reducing per-iteration complexity from O(p) to O(|A_n|) while preserving key theoretical properties such as sure screening and contraction. Empirically, the proposed framework achieves strong signal recovery together with substantially improved control of false discovery rates relative to existing methods. BUGS-Active scales to dimensions up to p = 1,000,000, and is applied to a DNA methylation study with n=1051 subjects and approximately 850,000 CpG sites, yielding accurate prediction and interpretable sparse selection. These results establish marginally guided shrinkage as a powerful and scalable paradigm for high-dimensional Bayesian inference.

Abstract PDF Upgrade to Chat

Authors (1)

Priyam Das

Summary

The paper introduces BUGS, which integrates univariate guidance into the horseshoe prior to adaptively modulate shrinkage in ultra-high-dimensional regression.
It presents BUGS-Active, a computational strategy that dynamically selects active variable sets, reducing complexity for p in the millions.
Empirical results demonstrate significant FDR reductions and enhanced signal detection in genomic data, supported by strong theoretical guarantees.

Bayesian Univariate-Guided Global-Local Shrinkage for Ultra-High-Dimensional Regression

Introduction and Context

This work presents a principled extension of Bayesian global-local shrinkage—specifically the horseshoe family of priors—via the direct incorporation of univariate (marginal) guidance into the prior hierarchy, yielding the Bayesian Univariate-Guided Sparse Regression (BUGS) and its computational variant, BUGS-Active. The methodology is situated toward sparse inference and model selection in ultra-high-dimensional linear models where $p \gg n$ , and is particularly suited for genomic or epigenomic settings, such as DNA methylation studies with hundreds of thousands to millions of candidate predictors.

Traditional global-local priors (e.g., regularized horseshoe) treat predictors symmetrically a priori, deferring all adaptation to the likelihood. The key innovation is to embed marginal relevance information, assessed for each predictor, into the shrinkage mechanism in a continuous, covariate-dependent, but fully Bayesian fashion.

Marginal Guidance in Shrinkage Priors

Within the BUGS framework, for each coefficient $\beta_j$ , the marginal relevance is quantified by a standardized statistic (e.g., scaled absolute marginal correlation), transformed and bounded to produce a guidance score $z_j^*$ . This statistic is computed a priori from observed data and then treated as fixed (not subject to Bayesian inference), analogous to conditioning on an empirical Bayes prior.

Marginal guidance influences the shrinkage hierarchy by modulating the local variance:

Figure 1: Schematic of the guided regularized horseshoe prior, in which the effective scale for $\beta_j$ is determined by both the classical global-local scales and a multiplicative guidance multiplier dependent on marginal association.

Specifically, the effective prior variance for $\beta_j$ is

$\tilde{\kappa}_j^2 = \frac{c^2 \tau^2 \lambda_j^2 \exp(\eta z_j^*)}{c^2 + \tau^2 \lambda_j^2 \exp(\eta z_j^*)}$

where $\lambda_j$ is the local scale, $\tau$ the global scale, $c$ the slab scale, and $\eta$ the strength of the guidance effect (also assigned a prior and learned from the data). The term $\beta_j$ 0 amplifies (attenuates) the effective variance for variables with large (small) marginal effects.

Figure 2: The effect of marginal guidance on the prior; positive $\beta_j$ 1 increases the local prior scale, enabling earlier escape from shrinkage, whereas negative $\beta_j$ 2 intensifies shrinkage.

This formulation alters not just the amount of shrinkage per variable but structurally shifts the transition point from strong shrinkage to slab behavior in the prior—a key distinction from baseline weighted or covariate-dependent shrinkage priors, which merely modulate scale without affecting the nonlinearity of the thresholding.

Posterior Computation and Active-Set Approximation

For $\beta_j$ 3 up to approximately $\beta_j$ 4, the full model may be sampled via a hybrid Gibbs/slice sampler with Woodbury-based block updates for the coefficient vector. To enable inference for $\beta_j$ 5, BUGS-Active restricts expensive local scale updates to a dynamic active set of variables prioritized by their guidance scores and/or large posterior coefficients. This reduces per-iteration complexity from $\beta_j$ 6 to $\beta_j$ 7 for local updates, while the coefficient vector is always sampled globally.

BUGS-Active is more than a computational heuristic: the paper establishes that when the active set screening satisfies suitable separation properties (codified via marginal score and posterior thresholding), theoretical properties such as posterior concentration, sure screening, and false discovery control are preserved asymptotically.

Figure 3: Impact of the guidance budget (size of active set for local updates) on posterior estimation for strong signals, illustrating the trade-off between aggressive computational savings and accurate posterior recovery.

Theoretical Properties

The paper synthesizes prior concentration and posterior contraction theory for the guided prior, explicitly accounting for both informative and uninformative marginal statistics. Main results include:

The guided prior retains optimal (minimax) contraction rates for sparse high-dimensional linear models, matching unguided global-local priors when guidance is uninformative.
Under informative guidance, the prior induces systematic shrinkage separation between signals and noise, enhancing FDR control.
The active-set posterior with data-driven screening preserves support and contraction rates if screening thresholds are adequately separated from the noise distribution.

These properties are established under standard high-dimensional conditions (sparsity, restricted eigenvalue, bounded signals, bounded design norm), with the guidance vector and active set treated as fixed, potentially constructed from auxiliary data when necessary for formal validity.

Empirical Evaluation

Extensive simulations demonstrate that the guided prior yields substantial reductions in false discovery rates (FDR), often an order of magnitude lower than standard global-local or LASSO-type methods, while maintaining near-perfect true positive rates and high Matthews correlation coefficients (MCC) even as $\beta_j$ 8 increases to $\beta_j$ 9.

Figure 4: Marginal guidance effect in low dimension: guidance-modulated prior (red) yields more aggressive noise shrinkage and substantial signal separation compared to regularized horseshoe (blue).

The methodology is applied to an ultra-high-dimensional DNA methylation dataset ( $z_j^*$ 0, $z_j^*$ 1), yielding sparse, interpretable effect estimates and improved out-of-sample predictive performance when compared to an unguided prior.

Figure 5: Posterior summaries for the top 10 identified CpGs influencing age in methylation data, posterior selection probabilities, effect directions, and predictive performance by developmental stage.

The results indicate that marginal guidance provides a continuous, data-adaptive regularization path that sharpens separation between true signals and noise, directly translating into improved selection and estimation properties in both synthetic and real datasets.

Discussion and Implications

BUGS and BUGS-Active contribute a scalable, theoretically justified Bayesian methodology for sparse variable selection and estimation in ultra-high-dimensional settings. Integration of marginal relevance as a multiplicative factor in the nonlinear shrinkage mapping enables more responsive adaptation and improved FDR control, especially relevant where protector-wise informativeness varies.

The framework provides a formal bridge between classical sure independence screening and modern Bayesian shrinkage, and its active-set approach delivers practical computational tractability for $z_j^*$ 2 in the millions.

Key implications include:

For statistical genomics and similar fields, the approach enables principled, uncertainty-quantified, interpretable inference without aggressive pre-screening or ad hoc post-selection adjustments.
Theoretical robustness to uninformative guidance means the method is safe to deploy in scenarios where prior predictor information is weak or unreliable.
As AI applications increasingly intersect with high-throughput data, such as feature selection in automated science, guided shrinkage mechanisms of this kind increase both the statistical performance and transparency of inference under extreme dimensionality.
Further extensions to generalized linear and survival models, as well as adaptively tuned guidance budgets, will enable broader deployment in practice.

Conclusion

BUGS and BUGS-Active demonstrate that marginally guided shrinkage—realized through a data-adaptive modulation of the prior in the regularized horseshoe family—offers substantial advantage for ultra-high-dimensional sparse regression. The methodology brings tighter FDR control, improved interpretability, and practical scalability, with strong theoretical guarantees and demonstrable empirical superiority. This paradigm, integrating univariate guidance with nonlinear shrinkage, represents a natural and powerful direction for hierarchical Bayes in the era of massive, sparse data (2604.04964).

Markdown Report Issue