Papers
Topics
Authors
Recent
2000 character limit reached

Bayesian Lasso Models

Updated 24 December 2025
  • Bayesian Lasso models are regularization techniques that impose Laplace priors to encourage sparsity in high-dimensional estimation and variable selection.
  • They use hierarchical formulations with scale mixtures of normals, facilitating efficient inference via Gibbs sampling, EM algorithms, and variational approximations.
  • Extensions like fused, group, and adaptive variants enhance model flexibility, improving prediction accuracy and structured learning in complex data settings.

A Bayesian Lasso model is a class of Bayesian regularization methods for high-dimensional estimation, variable selection, and structure learning in regression, state-space models, and graphical modeling. The central feature is the imposition of an 1\ell_1 (Laplace) or generalized sparsity-inducing prior on coefficients or parameter differences, promoting shrinkage and exact or approximate sparsity. These priors are represented hierarchically—often as scale mixtures of normals—enabling efficient inference via Gibbs sampling, EM algorithms, or variational approximations. Numerous extensions include fused, group, adaptive, graphical, and horseshoe variants.

1. Hierarchical Formulation and Core Principles

At the heart of the Bayesian Lasso is a double-exponential (Laplace) prior on regression coefficients, which encourages sparsity by penalizing the absolute value of coefficients:

p(βjλ)=λ2exp(λβj),j=1,,p .p(\beta_j \mid \lambda) = \frac{\lambda}{2}\exp(-\lambda |\beta_j|), \quad j = 1, \dots, p\ .

This can be cast in a scale-mixture-of-normals form:

βjτj,σ2N(0,σ2τj),τjExp(λ2/2).\beta_j \mid \tau_j, \sigma^2 \sim N(0, \sigma^2\tau_j),\quad \tau_j \sim \mathrm{Exp}(\lambda^2/2).

This hierarchical representation leads to conditionally conjugate full conditionals, which are amenable to efficient block Gibbs sampling for posterior inference (Rajaratnam et al., 2017, Helgøy et al., 2019).

In the Bayesian fused lasso, a Laplace prior is additionally imposed on coefficient differences, e.g., for ordered predictors:

π(βσ2)exp(λ1σj=1pβjλ2σj=2pβjβj1)\pi(\beta \mid \sigma^2) \propto \exp\biggl(-\frac{\lambda_1}{\sigma}\sum_{j=1}^p |\beta_j| - \frac{\lambda_2}{\sigma}\sum_{j=2}^p |\beta_j - \beta_{j-1}|\biggr)

with similar mixture-of-normals formulations for both terms (Kakikawa et al., 2022, Shimamura et al., 2016, Kakikawa et al., 2023).

2. Posterior Computation and Inference

Bayesian lasso models are characterized by conditionally Gaussian full conditionals once appropriate latent variables are introduced. The canonical blocked Gibbs sampler cycles through (i) updating the regression coefficients from a multivariate normal, (ii) local scales via inverse-Gaussian, (iii) error variance and global penalty from inverse-Gamma or Gamma full conditionals (Rajaratnam et al., 2017, Kakikawa et al., 2022).

For models with fused or structured priors, additional latent variables (scales for differences, indicators for group membership or structural zeros) are updated similarly. In fused lasso state-space models, forward filtering-backward sampling (FFBS) can still be applied when the augmented state equation remains linear-Gaussian conditioned on latent variables (Irie, 2019).

Variational inference has been developed as an alternative, with closed-form mean-field CAVI updates involving normal, gamma, and generalized inverse-Gaussian variational factors (González-Navarrete et al., 18 Jun 2024, Tung et al., 2016).

In cases where the Laplace prior's "soft-thresholding" is insufficient for exact sparsity, type-II maximum likelihood (empirical Bayes) approaches or sparse fused algorithms can induce exact zeros (Helgøy et al., 2019, Shimamura et al., 2016).

3. Structured and Adaptive Extensions

Bayesian lasso methodology has been extended in numerous directions:

  • Fused Lasso: Shrinks both coefficients and their differences, with NEG and horseshoe priors on differences providing heavier-tailed, spike-and-slab, or adaptively fused structures, preserving large jumps while encouraging fusion elsewhere (Shimamura et al., 2016, Kakikawa et al., 2022, Kakikawa et al., 2023).
  • Dynamic and State-Space Models: Dynamic fused lasso imposes double-exponential shrinkage toward the previous state and zero, with tractable conditionally Gaussian FFBS and explicit log-geometric mixture representations, and can be extended hierarchically to induce horseshoe-like marginal shrinkage (Irie, 2019).
  • Group and Adaptive Lasso: Group lasso priors are implemented as multivariate Laplace or their scale-mixture forms, shrinking coefficient blocks together. Adaptive lasso introduces coefficient-specific shrinkage, realized via individual penalty parameters with gamma or empirical Bayes updates (Reeder et al., 9 Jan 2024, Li et al., 2015, Tung et al., 2016).
  • Spike-and-Slab: Continuous spike-and-slab Laplace mixtures enable both exact zeros and slab-penalized large signals, yielding efficient self-adaptive model selection in high-dimensional settings and graphical structure estimation (Li et al., 2018).
  • Graphical and Chain Graph Lasso: Laplace, group, and spike-and-slab priors on graphical precision matrices or chain graph parameters facilitate sparse structure learning and parsimonious graphical model selection (Shen et al., 2020, Talluri et al., 2013).
  • Robust and Extreme Value Variants: Alternative likelihoods (Huber, hyperbolic, EGPD) can be combined with lasso priors for robustness to outliers or heavy tails, including extensions to tail inference and structured modeling of conditional extremes (Kawakami et al., 2022, Carvalho et al., 2020).
  • Nonparametric and Community Models: Dirichlet process mixtures of lasso priors enable clustering and adaptive shrinkage of parameter groups, especially in high-dimensional VAR and network settings (Billio et al., 2016).

4. Theoretical Properties and Convergence

Two-step blocked Gibbs samplers for the Bayesian lasso are trace class, enjoy geometric ergodicity, and offer explicit convergence bounds improved over unblocked samplers (Rajaratnam et al., 2017, Cui et al., 23 Dec 2025). For log-concave likelihoods (including probit, logistic, and certain Gaussian error models), mixing times of the canonical data-augmentation samplers scale polynomially in the sample size and number of coefficients, up to logarithmic factors, provided the penalty grows moderately with sample size (Cui et al., 23 Dec 2025). The conductance-based analysis associates mixing rates with spectral gap lower bounds, with practical implications for warm starting and Monte Carlo error control.

Empirical comparisons consistently demonstrate improved effective sample sizes and lower autocorrelation for blocked versus standard Gibbs or MCMC samplers. Type-II maximum likelihood (BLS, ARD) approaches give exact zeros, while marginalized Bayesian samplers produce only soft-shrunk coefficients absent thresholding (Helgøy et al., 2019).

5. Model Selection, Tuning, and Practical Implementation

Penalty hyperparameters can be updated either in a fully Bayesian regime with hyperpriors or via empirical Bayes/MCEM updates, with the latter often improving mixing and stability in high dimensions (Reeder et al., 9 Jan 2024, Tung et al., 2016). Posterior variable selection can be based on marginal credible intervals, thresholded scaled-neighborhood probabilities, or sparsity-inducing EM algorithms. For grouped or graphical settings, group-wise or edge-wise penalties can be learned via conjugate updates or data-driven adaptive priors (Shen et al., 2020, Li et al., 2018).

Computationally, each Gibbs or blocked sampler iteration typically requires O(np2)\mathcal{O}(np^2) effort for regression models, though special structure (e.g., banded, tridiagonal, Kronecker, or block sparsity) can reduce cost, especially in state-space or graphical models (Irie, 2019, Kakikawa et al., 2023, Shen et al., 2020). MCMC, coordinate ascent variational inference, and fast EM/ARD-type thresholding are all feasible depending on context (González-Navarrete et al., 18 Jun 2024, Helgøy et al., 2019).

6. Comparative Performance and Empirical Results

Simulation and real-data studies demonstrate that Bayesian lasso and its structured extensions can outperform classical lasso/elastic-net in terms of mean squared error, prediction error, variable selection fidelity, and block or group recovery—especially when the underlying signal is block-constant, group-sparse, or exhibits abrupt regime changes (Shimamura et al., 2016, Kakikawa et al., 2022, Kakikawa et al., 2023, Reeder et al., 9 Jan 2024). Horseshoe and NEG priors on differences reduce bias for large genuine jumps and yield sharper segmentation, while ARD- and spike-and-slab-type approaches further improve sparsity and reduce bias on large coefficients (Li et al., 2018, Helgøy et al., 2019).

Bayesian variants for binary data (logistic regression), generalized linear mixed models, and time-to-event models with censoring show improved selection accuracy and predictive stability over both frequentist and standard Bayesian regularization approaches (Kakikawa et al., 2023, Tung et al., 2016, Reeder et al., 9 Jan 2024). For graphical models, Bayesian lasso-type and spike-and-slab priors facilitate simultaneous model selection and parameter estimation, often yielding exact zeros and adaptively borrowing strength across structures (Talluri et al., 2013, Li et al., 2018).

Empirical robustness (e.g., under outlier contamination) is further enhanced by Huberized or heavy-tailed-likelihood Bayesian lasso variants, with performance closely matching or improving on Student-tt or median-regression approaches (Kawakami et al., 2022).


References:

Definition Search Book Streamline Icon: https://streamlinehq.com
References (18)

Whiteboard

Follow Topic

Get notified by email when new papers are published related to Bayesian Lasso Models.