Bayesian Pliable Lasso with Horseshoe Prior
- The methodology extends the frequentist pliable lasso by incorporating a horseshoe prior to enforce strong heredity for both main and interaction effect selection.
- It employs a hierarchical global-local shrinkage framework with Gaussian and inverse-gamma priors, facilitating efficient Gibbs sampling and robust posterior inference.
- Empirical results demonstrate improved variable recovery and prediction accuracy compared to traditional lasso methods in high-dimensional regression contexts.
The Bayesian Pliable Lasso with Horseshoe Prior is a hierarchical, global–local shrinkage framework designed to enable sparse estimation and uncertainty quantification for both main and interaction effects in high-dimensional regression and generalized linear models (GLMs). The methodology extends the frequentist pliable lasso, which is noted for its ability to model interactions under strong heredity constraints, by introducing explicit probabilistic modeling of sparsity and effect selection using the horseshoe prior, with extensions to handle missing responses via integrated data augmentation and efficient Gibbs sampling (Mai, 9 Sep 2025).
1. Model Formulation and Motivating Principles
The Bayesian pliable lasso is built to identify a small set of predictors and their interactions with modifying variables while maintaining statistical interpretability and robust uncertainty assessment. The fundamental regression setting supposes observations , where is the scalar response, are predictors, and are modifiers (e.g., categorical covariates, environmental factors, etc.).
The linear predictor for subject is given by
This structure decomposes into main effects (), modifier-specific intercepts (), and interaction/heterogeneity effects (). The model extends naturally to exponential family likelihoods for GLMs.
The horseshoe prior is placed on both main effects and their corresponding interaction vectors by coupling their shrinkage scales:
- For each predictor :
A single global scale parameter further regularizes the collective magnitude of all effects.
The strong heredity constraint is enforced by assigning both main and interaction effects the same local scale , ensuring that an inactive (i.e., heavily shrunk) main effect automatically suppresses its associated interaction terms.
2. Hierarchical Prior Specification and Heredity
The hierarchical global–local prior, central to the horseshoe approach, possesses several defining characteristics:
- Very strong spike at zero: induces aggressive shrinkage on noise effects and enforces sparsity.
- Extremely heavy tails: allows nonzero effects to "escape" shrinkage and prevents over-suppression of important predictors or interactions.
- Hierarchical coupling: the shared local scale for a group (main effect and its interactions) enforces group shrinkage and naturally encodes the strong heredity principle.
This structure is formalized as:
The inverse-gamma representation allows efficient conjugate Gibbs updates. The same prior is placed on for the modifying variable intercept, and on for the unpenalized global intercept.
3. Posterior Computation and Gibbs Sampling
A blockwise Gibbs sampler is constructed to exploit the conjugacy of the Gaussian likelihood and the normal/inverse-gamma prior components. At each iteration, updates proceed as follows:
- , with and the corresponding mean conditional on other effects, current data residuals, and prior parameters.
- , with analogous parameter updates reflecting the penalty and the design structure.
- Local scale updates: and auxiliary variables using their inverse-gamma full conditionals.
- Global scale and its auxiliary: and via inverse-gamma forms, ensuring adaptation to overall sparsity.
- Missing responses: when has missing values, these are imputed conditionally given predictors and regression effects, (in the Gaussian case) at each iteration.
- Intercepts and error variance: sampled from their conjugate distributions.
This blockwise approach maintains computational tractability even for high-dimensional main and modifier spaces (Mai, 9 Sep 2025).
4. Extension to Generalized Linear Models and Missing Data
The model can be generalized for arbitrary exponential family outcomes. For each :
with the linear predictor as defined above. The Gibbs sampler is modified so that, where possible, Pólya–Gamma data augmentation or other latent variable approaches are used to maintain conjugacy (as in logistic regression), or Metropolis–Hastings updates are used otherwise.
When response data contain missingness, integrative data augmentation is used: missing are imputed inside the MCMC, using either their full data likelihood (when possible) or conditional predictive draws. This approach naturally yields posterior inference under the observed-data likelihood.
5. Theoretical Properties and Practical Implications
The global–local horseshoe prior structure is supported by a substantial theoretical foundation:
- The posterior under the horseshoe prior contracts near-minimax adaptivity in ultra-sparse and high-dimensional regimes (Pas et al., 2017), and the MMLE or hierarchical Bayes construction on yields adaptive contraction without requiring prior knowledge of the sparsity level.
- Aggressive shrinkage at zero ensures the exclusion of irrelevant predictors and interactions, with the heavy-tailed prior safeguarding against shrinkage of large signals.
- The conjugate hierarchical structure admits direct computation of the marginal likelihood for model selection or hyperparameter tuning (e.g., via Chib's algorithm) (Makalic et al., 2015).
- The approach enables interpretable, hierarchically constrained interaction modeling, uncertainty quantification through credible intervals, and full Bayesian inference for both coefficient sets and derived functionals.
In simulation and real-world studies (e.g., neuroimaging and clinical data), the Bayesian pliable horseshoe consistently outperforms standard lasso, frequentist pliable lasso, and classical horseshoe models in variable selection, recovery of true main and interaction structure, and prediction error. Notably, when modifying variables are binary or interaction complexity is high, the hereditary shrinkage mechanism brings substantial advantages for both estimation accuracy and parsimony (Mai, 9 Sep 2025).
6. Computational and Implementation Aspects
The model and inference procedure are implemented in the hspliable R package, which relies on Rcpp and RcppArmadillo for efficient matrix operations and large-scale computation. All essential posterior components are updated in blocks using conjugacy and efficient linear algebra, and the package supports both complete and missing outcome data. Simulation studies and real data examples illustrate the scalability and interpretability of the method, with point estimates and credible intervals highlighting the model's capacity for meaningful uncertainty quantification and interaction recovery.
7. Contextualization within the Shrinkage and Sparse Estimation Literature
The Bayesian pliable lasso with horseshoe prior is positioned as a probabilistic generalization of lasso-type and global–local shrinkage methods:
- The horseshoe prior is shown to possess regular variation and polynomial tails, unlike the Laplace (lasso) prior, which leads to reduced bias and improved large-signal recovery in sparse settings (Bhadra et al., 2015, Bhadra et al., 2017).
- Variants such as horseshoe+ or deeper product mixtures may yield even tighter concentration and lower MSE in ultra-sparse applications (Bhadra et al., 2015).
- Compared to point-mass mixture priors, the horseshoe and its generalizations achieve strong sparsity and computational feasibility in high dimensions without explicit variable selection indicators.
- The hierarchical coupling of group shrinkage (heredity constraints) and local adaptivity could, in principle, be extended to even more flexible structures such as groupings, latent hierarchical layers, or graph-based penalties, as is done in some contemporary shrinkage frameworks.
A plausible implication is that the modeling framework of the Bayesian pliable lasso with horseshoe prior can serve as a prototype for generalized structured sparsity and interaction modeling—inheriting the strong theoretical guarantees, efficient posterior computation, and interpretability arising from the global–local shrinkage architecture.
In summary, the Bayesian Pliable Lasso with Horseshoe Prior provides a comprehensive, theoretically justified approach to sparse effect and interaction selection in regression and GLMs, with fully probabilistic treatment of heredity constraints, scalable Gibbs sampling via conjugate hierarchical modeling, and demonstrable improvement over classical and regularized alternatives (Mai, 9 Sep 2025).