Bayesian Pliable Lasso

Updated 10 September 2025

Bayesian Pliable Lasso is a probabilistic framework that extends pliable lasso by incorporating hierarchical horseshoe priors to model main and structured interaction effects under sparsity constraints.
It enforces strong heredity by sharing local shrinkage parameters between main effects and interactions, ensuring interpretable variable selection and robust performance.
Efficient Gibbs sampling with data augmentation supports posterior inference, missing data imputation, and accurate predictions in high-dimensional models.

Bayesian Pliable Lasso is a probabilistic framework that extends the pliable lasso methodology to regularized regression problems involving both main effects and structured interaction effects, particularly under the sparse regime where uncertainty quantification and adaptive shrinkage are crucial. This approach replaces deterministic penalty terms with hierarchical sparsity-inducing priors, typically the horseshoe prior, facilitating posterior inference and principled uncertainty assessment in high-dimensional models, including those with missing data and generalized linear models (GLMs) (Mai, 9 Sep 2025).

1. Hierarchical Bayesian Formulation

The Bayesian pliable lasso generalizes the frequentist pliable lasso by introducing hierarchical global-local shrinkage priors over model parameters. For data $y = (y_1, \dots, y_n)$ , predictors $X$ ( $n \times p$ ), and modifiers $Z$ ( $n \times q$ ), the model employs the linear predictor

$\eta_i = \beta_0 + Z_i^\top \theta_0 + \sum_{j=1}^p x_{ij} \left(\beta_j + Z_i^\top \theta_j \right)$

with main effects $\beta_j$ and pliable (interaction/modifier) effects $\theta_j$ ( $q$ -dimensional).

Sparsity and heredity constraints are enforced through the hierarchical horseshoe prior: $\beta_j \sim \mathcal{N}(0, \lambda_j^2 \tau^2), \quad \theta_j \sim \mathcal{N}(0, \lambda_j^2 \tau^2 I_q), \quad \lambda_j \sim \text{half-Cauchy}(0,1), \quad \tau \sim \text{half-Cauchy}(0,1)$ where each interaction shares the local shrinkage scale $\lambda_j$ of its corresponding main effect, thus strictly enforcing strong heredity: interaction effects are only active if their parent main effect is nonzero (Mai, 9 Sep 2025).

2. Modeling Interaction Effects under Strong Heredity

The pliable lasso mechanism models interactions between predictors and modifying variables, letting each predictor’s total effect vary as a sparse linear function of the modifiers: $y_i = \beta_0 + Z_i^\top \theta_0 + \sum_{j=1}^p x_{ij} (\beta_j + Z_i^\top \theta_j) + \epsilon_i$ with $\epsilon_i \sim \mathcal{N}(0, \sigma^2)$ .

The strong heredity constraint is realized by sharing the local scale $\lambda_j$ between $\beta_j$ and $\theta_j$ , so that interaction effects can be pruned jointly with the main effect. This constraint ensures that a modifier effect enters the model only when the corresponding main effect is present, supporting interpretability and mitigating overfitting in high dimensions (Mai, 9 Sep 2025). The framework naturally generalizes to settings where predictors enter both as main effects and as modifiers in interaction terms.

3. Posterior Inference and Missing Data

Posterior inference proceeds via Gibbs sampling with data augmentation. When the response vector contains missing entries, these are treated as latent variables and imputed at each iteration given current parameter values: $y_i^\text{miss} \mid \text{rest} \sim \mathcal{N}(\eta_i, \sigma^2)$ The other parameters are updated from conditional posteriors leveraging conjugacy:

$\beta_j$ (main effect): Gaussian draw with variance $[(x_j^\top x_j)/\sigma^2 + 1/(\lambda_j^2 \tau^2)]^{-1}$
$\theta_j$ (modifier coefficients): multivariate Gaussian
Shrinkage scales $\lambda_j^2$ , $\tau^2$ : inverse-gamma, exploiting half-Cauchy parameterization [Makalic-Schmidt]
Intercepts and noise variance: Gaussian and inverse-gamma updates

This mechanism permits coherent uncertainty quantification for both main and interaction effects, as well as for imputed missing responses, and scales to moderate/high dimensions without breaking conjugacy (Mai, 9 Sep 2025).

4. Empirical Performance and Model Selection

Simulation studies and real-data analyses demonstrate that Bayesian pliable lasso with horseshoe priors (labeled “pHS”) achieves superior estimation and prediction:

Lower squared $\ell_2$ loss for $\beta$ and $\theta$ (e.g., Est $(\beta)\approx 0.02$ , Est $(\theta)\approx 0.07$ for $n=500$ ), compared to regular horseshoe or lasso (Mai, 9 Sep 2025)
Improved prediction error, robust to binary or continuous modifiers
Higher accuracy in variable selection, approaching zero false discovery and false positive rates for both main and interaction coefficients.
In neuroimaging applications, pHS and frequentist pliable lasso identify interpretable interaction effects, with credible intervals supporting uncertainty assessment.

The model consistently outperforms non-interaction methods in both recovery of structured interactions and test-set prediction error. Inclusion of strong heredity via hierarchical priors leads to interpretable selection and avoids spurious modifier effects.

5. Computational Implementation

Efficient posterior computation is achieved by reparameterizing the half-Cauchy priors using auxiliary inverse-gamma variables: $\lambda_j^2 \sim \text{IG}(\tfrac{1}{2}, 1/\nu_j), \quad \nu_j \sim \text{IG}(\tfrac{1}{2}, 1)$ This allows closed-form Gibbs updates for all components, maintaining scalability and numerical stability.

The methodology has been implemented in the R package hspliable (https://github.com/tienmt/hspliable), leveraging high-performance C++ routines via Rcpp and RcppArmadillo. The package supports models with missing data, allows GLM extensions, and offers tools for full Bayesian inference and model summarization (Mai, 9 Sep 2025).

Bayesian pliable lasso inherits key modeling strategies from Bayesian adaptive lasso (Leng et al., 2010), including hierarchical shrinkage and model averaging for prediction, and from the general literature on scale mixture priors for sparsity and interactions (Tibshirani et al., 2017, Rajaratnam et al., 2017). The horseshoe prior is favored for robust handling of strong and weak signals, outperforming Laplace (lasso) priors in ultra-sparse regimes.

A plausible implication is that the flexible hierarchical structure of the Bayesian pliable lasso could be extended to alternative sparsity-inducing priors, nonlocal penalties (Mallick et al., 2020), or spike-and-slab frameworks for predictive density optimality (Rockova, 2023). The general framework accommodates structured penalties and prior dependencies for enforcing more complex constraints (e.g., weak heredity, group structure), suggesting broad applicability to high-dimensional interaction modeling.

7. Significance and Outlook

Bayesian pliable lasso provides a principled approach for sparse interaction modeling in regression and GLM settings, enabling

Strong heredity constraint enforcement through hierarchical shared shrinkage
Adaptive selection of important main and interaction effects
Coherent quantification of uncertainty in coefficients and predictions
Robust recovery under missing outcome data

The availability of fast Gibbs sampling, principled shrinkage, and modular software implementations makes this framework a competitive choice for practitioners dealing with heterogeneous and high-dimensional data subject to structured interactions. Further extensions may address generalized penalty structures, integration with network regularization (Shimamura et al., 2021), and optimal predictive inference in the sparse regime.

Table: Summary of Key Features of Bayesian Pliable Lasso (Mai, 9 Sep 2025)

Design Element	Bayesian Pliable Lasso (pHS)	Classical Pliable Lasso
Penalty type	Hierarchical horseshoe prior	L₁ and group-L₂ penalties
Interaction modeling	Modifier effects $\theta_j$ , strong heredity	Modifier effects $\theta_j$ , strong heredity
Uncertainty quantification	Full posterior, credible intervals	Not available
Missing response handling	Bayesian data augmentation/imputation	Not directly supported
Computational engine	Gibbs sampling, Rcpp implementation	Coordinate descent, R/c++
Model selection accuracy	High, low FDR/FPR, interpretable	High, but no uncertainty quantification

Bayesian pliable lasso thus stands as a versatile, theoretically grounded, and practically validated approach for sparse regression modeling under interaction and heredity constraints.