Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 80 tok/s
Gemini 2.5 Pro 49 tok/s Pro
GPT-5 Medium 33 tok/s Pro
GPT-5 High 25 tok/s Pro
GPT-4o 117 tok/s Pro
Kimi K2 176 tok/s Pro
GPT OSS 120B 457 tok/s Pro
Claude Sonnet 4.5 32 tok/s Pro
2000 character limit reached

Bayesian Pliable Lasso with Horseshoe Prior

Updated 7 October 2025
  • The methodology extends the frequentist pliable lasso by incorporating a horseshoe prior to enforce strong heredity for both main and interaction effect selection.
  • It employs a hierarchical global-local shrinkage framework with Gaussian and inverse-gamma priors, facilitating efficient Gibbs sampling and robust posterior inference.
  • Empirical results demonstrate improved variable recovery and prediction accuracy compared to traditional lasso methods in high-dimensional regression contexts.

The Bayesian Pliable Lasso with Horseshoe Prior is a hierarchical, global–local shrinkage framework designed to enable sparse estimation and uncertainty quantification for both main and interaction effects in high-dimensional regression and generalized linear models (GLMs). The methodology extends the frequentist pliable lasso, which is noted for its ability to model interactions under strong heredity constraints, by introducing explicit probabilistic modeling of sparsity and effect selection using the horseshoe prior, with extensions to handle missing responses via integrated data augmentation and efficient Gibbs sampling (Mai, 9 Sep 2025).

1. Model Formulation and Motivating Principles

The Bayesian pliable lasso is built to identify a small set of predictors and their interactions with modifying variables while maintaining statistical interpretability and robust uncertainty assessment. The fundamental regression setting supposes observations (yi,xi,Zi)(y_i, x_i, Z_i), where yiy_i is the scalar response, xiRpx_i \in \mathbb{R}^p are predictors, and ZiRqZ_i \in \mathbb{R}^q are modifiers (e.g., categorical covariates, environmental factors, etc.).

The linear predictor for subject ii is given by

ηi=β0+Ziθ0+j=1pxij(βj+Ziθj).\eta_i = \beta_0 + Z_i^\top \theta_0 + \sum_{j=1}^p x_{ij}\left( \beta_j + Z_i^\top \theta_j \right).

This structure decomposes into main effects (β\beta), modifier-specific intercepts (θ0\theta_0), and interaction/heterogeneity effects (θj\theta_j). The model extends naturally to exponential family likelihoods for GLMs.

The horseshoe prior is placed on both main effects and their corresponding interaction vectors by coupling their shrinkage scales:

  • For each predictor jj:
    • βjN(0,λj2τ2)\beta_j \sim N(0, \lambda_j^2 \tau^2)
    • θjN(0,λj2τ2Iq)\theta_j \sim N(0, \lambda_j^2 \tau^2 I_q)
    • λjHalf-Cauchy(0,1)\lambda_j \sim \mathrm{Half\text{-}Cauchy}(0,1)

A single global scale parameter τHalf-Cauchy(0,1)\tau \sim \mathrm{Half\text{-}Cauchy}(0,1) further regularizes the collective magnitude of all effects.

The strong heredity constraint is enforced by assigning both main and interaction effects the same local scale λj\lambda_j, ensuring that an inactive (i.e., heavily shrunk) main effect automatically suppresses its associated interaction terms.

2. Hierarchical Prior Specification and Heredity

The hierarchical global–local prior, central to the horseshoe approach, possesses several defining characteristics:

  • Very strong spike at zero: induces aggressive shrinkage on noise effects and enforces sparsity.
  • Extremely heavy tails: allows nonzero effects to "escape" shrinkage and prevents over-suppression of important predictors or interactions.
  • Hierarchical coupling: the shared local scale λj\lambda_j for a group (main effect and its interactions) enforces group shrinkage and naturally encodes the strong heredity principle.

This structure is formalized as:

βjN(0,λj2τ2), θjN(0,λj2τ2Iq), λj2IG(12,1νj),νjIG(12,1), τ2IG(12,1ξ),ξIG(12,1).\begin{align*} \beta_j &\sim N(0, \lambda_j^2 \tau^2), \ \theta_j &\sim N(0, \lambda_j^2 \tau^2 I_q), \ \lambda_j^2 &\sim \mathrm{IG}\left(\frac12, \frac{1}{\nu_j}\right), \quad \nu_j \sim \mathrm{IG}\left(\frac12, 1\right), \ \tau^2 &\sim \mathrm{IG}\left(\frac12, \frac{1}{\xi}\right), \quad \xi \sim \mathrm{IG}\left(\frac12, 1\right). \end{align*}

The inverse-gamma representation allows efficient conjugate Gibbs updates. The same prior is placed on θ0\theta_0 for the modifying variable intercept, and on β0\beta_0 for the unpenalized global intercept.

3. Posterior Computation and Gibbs Sampling

A blockwise Gibbs sampler is constructed to exploit the conjugacy of the Gaussian likelihood and the normal/inverse-gamma prior components. At each iteration, updates proceed as follows:

  • βjN(μβj,Vβj)\beta_j \mid \cdot \sim N(\mu_{\beta_j}, V_{\beta_j}), with Vβj=(wjwj/σ2+1/(λj2τ2))1V_{\beta_j} = \big(w_j^\top w_j /\sigma^2 + 1/(\lambda_j^2\tau^2)\big)^{-1} and μβj\mu_{\beta_j} the corresponding mean conditional on other effects, current data residuals, and prior parameters.
  • θjNq(μθj,Vθj)\theta_j \mid \cdot \sim N_q(\mu_{\theta_j}, V_{\theta_j}), with analogous parameter updates reflecting the penalty and the design structure.
  • Local scale updates: λj2\lambda_j^2 and auxiliary variables νj\nu_j using their inverse-gamma full conditionals.
  • Global scale and its auxiliary: τ2\tau^2 and ξ\xi via inverse-gamma forms, ensuring adaptation to overall sparsity.
  • Missing responses: when yy has missing values, these are imputed conditionally given predictors and regression effects, yiN(μi,σ2)y_i \sim N(\mu_i, \sigma^2) (in the Gaussian case) at each iteration.
  • Intercepts and error variance: sampled from their conjugate distributions.

This blockwise approach maintains computational tractability even for high-dimensional main and modifier spaces (Mai, 9 Sep 2025).

4. Extension to Generalized Linear Models and Missing Data

The model can be generalized for arbitrary exponential family outcomes. For each ii:

yiExponentialFamily(ηi),y_i \sim \mathrm{ExponentialFamily}(\eta_i),

with the linear predictor as defined above. The Gibbs sampler is modified so that, where possible, Pólya–Gamma data augmentation or other latent variable approaches are used to maintain conjugacy (as in logistic regression), or Metropolis–Hastings updates are used otherwise.

When response data contain missingness, integrative data augmentation is used: missing yiy_i are imputed inside the MCMC, using either their full data likelihood (when possible) or conditional predictive draws. This approach naturally yields posterior inference under the observed-data likelihood.

5. Theoretical Properties and Practical Implications

The global–local horseshoe prior structure is supported by a substantial theoretical foundation:

  • The posterior under the horseshoe prior contracts near-minimax adaptivity in ultra-sparse and high-dimensional regimes (Pas et al., 2017), and the MMLE or hierarchical Bayes construction on τ\tau yields adaptive contraction without requiring prior knowledge of the sparsity level.
  • Aggressive shrinkage at zero ensures the exclusion of irrelevant predictors and interactions, with the heavy-tailed prior safeguarding against shrinkage of large signals.
  • The conjugate hierarchical structure admits direct computation of the marginal likelihood for model selection or hyperparameter tuning (e.g., via Chib's algorithm) (Makalic et al., 2015).
  • The approach enables interpretable, hierarchically constrained interaction modeling, uncertainty quantification through credible intervals, and full Bayesian inference for both coefficient sets and derived functionals.

In simulation and real-world studies (e.g., neuroimaging and clinical data), the Bayesian pliable horseshoe consistently outperforms standard lasso, frequentist pliable lasso, and classical horseshoe models in variable selection, recovery of true main and interaction structure, and prediction error. Notably, when modifying variables are binary or interaction complexity is high, the hereditary shrinkage mechanism brings substantial advantages for both estimation accuracy and parsimony (Mai, 9 Sep 2025).

6. Computational and Implementation Aspects

The model and inference procedure are implemented in the hspliable R package, which relies on Rcpp and RcppArmadillo for efficient matrix operations and large-scale computation. All essential posterior components are updated in blocks using conjugacy and efficient linear algebra, and the package supports both complete and missing outcome data. Simulation studies and real data examples illustrate the scalability and interpretability of the method, with point estimates and credible intervals highlighting the model's capacity for meaningful uncertainty quantification and interaction recovery.

7. Contextualization within the Shrinkage and Sparse Estimation Literature

The Bayesian pliable lasso with horseshoe prior is positioned as a probabilistic generalization of lasso-type and global–local shrinkage methods:

  • The horseshoe prior is shown to possess regular variation and polynomial tails, unlike the Laplace (lasso) prior, which leads to reduced bias and improved large-signal recovery in sparse settings (Bhadra et al., 2015, Bhadra et al., 2017).
  • Variants such as horseshoe+ or deeper product mixtures may yield even tighter concentration and lower MSE in ultra-sparse applications (Bhadra et al., 2015).
  • Compared to point-mass mixture priors, the horseshoe and its generalizations achieve strong sparsity and computational feasibility in high dimensions without explicit variable selection indicators.
  • The hierarchical coupling of group shrinkage (heredity constraints) and local adaptivity could, in principle, be extended to even more flexible structures such as groupings, latent hierarchical layers, or graph-based penalties, as is done in some contemporary shrinkage frameworks.

A plausible implication is that the modeling framework of the Bayesian pliable lasso with horseshoe prior can serve as a prototype for generalized structured sparsity and interaction modeling—inheriting the strong theoretical guarantees, efficient posterior computation, and interpretability arising from the global–local shrinkage architecture.


In summary, the Bayesian Pliable Lasso with Horseshoe Prior provides a comprehensive, theoretically justified approach to sparse effect and interaction selection in regression and GLMs, with fully probabilistic treatment of heredity constraints, scalable Gibbs sampling via conjugate hierarchical modeling, and demonstrable improvement over classical and regularized alternatives (Mai, 9 Sep 2025).

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Bayesian Pliable Lasso with Horseshoe Prior.

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube