Papers
Topics
Authors
Recent
Search
2000 character limit reached

Dirichlet–Laplace Priors in Sparse Inference

Updated 13 January 2026
  • Dirichlet–Laplace priors are global–local shrinkage methods designed for high-dimensional sparse Bayesian inference, combining simplex-constrained weights with Laplace mixtures.
  • They achieve optimal posterior contraction rates and robust variable selection in regression and normal means models via conjugate-friendly Gibbs sampling.
  • Practical implementations use corrected sampling algorithms and alternative parameterizations to enhance computational efficiency and accurate uncertainty quantification.

The Dirichlet–Laplace (DL) prior is a global–local shrinkage prior designed for high-dimensional Bayesian inference under sparsity. It combines a simplex-constrained vector of local scales (Dirichlet weights) with Laplace or Gaussian scale mixtures, producing pronounced shrinkage near zero while retaining heavy tails for large signals. The DL prior delivers optimal posterior contraction rates, efficient computation via conjugate-friendly Gibbs sampling, and robust variable selection in regression and normal means models. Theoretical advances and practical implementation strategies clarify its analytic properties and correct subtle errors in earlier simulation algorithms.

1. Hierarchical Model and Marginal Representation

Consider the normal means model, yi=θi+εiy_i = \theta_i + \varepsilon_i, εiN(0,1)\varepsilon_i \sim N(0,1), for i=1,,ni = 1, \ldots, n. The DL prior introduces three layers of latent variables:

  • Local scale parameters: ψjExp(1/2)\psi_j \sim \mathrm{Exp}(1/2) for j=1,,nj = 1, \ldots, n.
  • Simplex-constrained weights: ϕ=(ϕ1,,ϕn)Dirichlet(a,,a)\phi = (\phi_1, \ldots, \phi_n) \sim \mathrm{Dirichlet}(a, \ldots, a).
  • Global scale: τGamma(na,1/2)\tau \sim \mathrm{Gamma}(n a, 1/2).

The hierarchical prior for θj\theta_j is:

θjψj,ϕj,τN(0,ψjϕj2τ2).\theta_j | \psi_j, \phi_j, \tau \sim N(0, \psi_j \phi_j^2 \tau^2).

Integrating over ψj\psi_j produces εiN(0,1)\varepsilon_i \sim N(0,1)0 double-exponential (Laplace) marginals. Further marginalization over εiN(0,1)\varepsilon_i \sim N(0,1)1 and εiN(0,1)\varepsilon_i \sim N(0,1)2 yields a density for εiN(0,1)\varepsilon_i \sim N(0,1)3 sharply peaked at zero and with exponential or polynomial tails, depending on εiN(0,1)\varepsilon_i \sim N(0,1)4 and the integration over Dirichlet and Gamma components (Bhattacharya et al., 2014, Bhattacharya et al., 2012).

An alternative parameterization replaces Dirichlet and Gamma with independent Gamma variables: letting εiN(0,1)\varepsilon_i \sim N(0,1)5, εiN(0,1)\varepsilon_i \sim N(0,1)6, so

εiN(0,1)\varepsilon_i \sim N(0,1)7

This version facilitates implementation and eliminates redundancy (Gruber et al., 16 Aug 2025).

2. Global–Local Shrinkage Mechanism

DL priors effect shrinkage by combining a sum-to-one constraint on local weights (the simplex) with a global scale. This architecture produces "joint shrinkage", suppressing all but a few signals via the spike at zero. For very sparse signals (e.g., under εiN(0,1)\varepsilon_i \sim N(0,1)8), the DL prior matches the concentration properties of discrete spike-and-slab mixtures, yet retains computational tractability (Bhattacharya et al., 2012).

Comparative properties:

  • Laplace (Bayesian Lasso): Exponential peak at zero, lighter tails — tends to over-shrink large coefficients.
  • Horseshoe: Heavy tails (εiN(0,1)\varepsilon_i \sim N(0,1)9) but weaker central spike.
  • Dirichlet–Laplace: Adjustable spike via i=1,,ni = 1, \ldots, n0 (small i=1,,ni = 1, \ldots, n1 gives near point-mass), exponential-to-Pareto tails (Zhang et al., 2016).

3. Theoretical Guarantees and Posterior Concentration

DL priors achieve minimax posterior contraction rates in sparse high-dimensional settings. In the normal means model, with i=1,,ni = 1, \ldots, n2 true nonzero coordinates in i=1,,ni = 1, \ldots, n3, set i=1,,ni = 1, \ldots, n4. Under i=1,,ni = 1, \ldots, n5 (typically i=1,,ni = 1, \ldots, n6), the posterior contracts at rate i=1,,ni = 1, \ldots, n7:

i=1,,ni = 1, \ldots, n8

provided i=1,,ni = 1, \ldots, n9 (Bhattacharya et al., 2014). The induced support size ψjExp(1/2)\psi_j \sim \mathrm{Exp}(1/2)0 for ψjExp(1/2)\psi_j \sim \mathrm{Exp}(1/2)1 remains proportional to ψjExp(1/2)\psi_j \sim \mathrm{Exp}(1/2)2, avoiding over-selection of spurious features.

In linear regression (ψjExp(1/2)\psi_j \sim \mathrm{Exp}(1/2)3), consistent posterior contraction and selection hold under ψjExp(1/2)\psi_j \sim \mathrm{Exp}(1/2)4 and suitable shrinking ψjExp(1/2)\psi_j \sim \mathrm{Exp}(1/2)5 (Zhang et al., 2016).

4. Posterior Computation: Gibbs Samplers and Algorithmic Corrections

DL posterior draws are computable via blocked Gibbs sampling exploiting the scale-mixture Gaussian and normalized random measure identities. Key update steps:

  • ψjExp(1/2)\psi_j \sim \mathrm{Exp}(1/2)6 update: Jointly normal, conditionally independent given ψjExp(1/2)\psi_j \sim \mathrm{Exp}(1/2)7 or ψjExp(1/2)\psi_j \sim \mathrm{Exp}(1/2)8.
  • Local scale update: ψjExp(1/2)\psi_j \sim \mathrm{Exp}(1/2)9 sampled as inverse-Gaussian.
  • Global scale update: j=1,,nj = 1, \ldots, n0 as generalized inverse-Gaussian.
  • Simplex weights update: j=1,,nj = 1, \ldots, n1 via normalized random measure draws, i.e., j=1,,nj = 1, \ldots, n2, j=1,,nj = 1, \ldots, n3 (Bhattacharya et al., 2012, Gruber et al., 16 Aug 2025).

An error in the original Bhattacharya et al. (2015) sampler concerned the order of updates: sampling j=1,,nj = 1, \ldots, n4 before j=1,,nj = 1, \ldots, n5, and using conditionals that had already integrated out j=1,,nj = 1, \ldots, n6, violates the marginal-conditional factorization and leads to incorrect stationary distributions (Gruber et al., 16 Aug 2025, Onorati et al., 7 Jul 2025). Corrected algorithms respect the sequence j=1,,nj = 1, \ldots, n7 within blocks (or use the j=1,,nj = 1, \ldots, n8 parameterization for independent updates).

Per iteration cost is j=1,,nj = 1, \ldots, n9, dominated by fast inverse-Gaussian and GIG draws. Mixing is efficient, but global shrinkage parameters (e.g., ϕ=(ϕ1,,ϕn)Dirichlet(a,,a)\phi = (\phi_1, \ldots, \phi_n) \sim \mathrm{Dirichlet}(a, \ldots, a)0, ϕ=(ϕ1,,ϕn)Dirichlet(a,,a)\phi = (\phi_1, \ldots, \phi_n) \sim \mathrm{Dirichlet}(a, \ldots, a)1) require diagnostic monitoring.

5. Applications in Regression, Variable Selection, and Performance

The DL prior supports high-dimensional linear regression, sparse estimation, and variable selection tasks. In regression,

ϕ=(ϕ1,,ϕn)Dirichlet(a,,a)\phi = (\phi_1, \ldots, \phi_n) \sim \mathrm{Dirichlet}(a, \ldots, a)2

assigns each coefficient ϕ=(ϕ1,,ϕn)Dirichlet(a,,a)\phi = (\phi_1, \ldots, \phi_n) \sim \mathrm{Dirichlet}(a, \ldots, a)3 a DL prior, facilitating sparse recovery via penalized credible regions:

  • Compute joint posterior credible ellipsoids;
  • Find the sparsest ϕ=(ϕ1,,ϕn)Dirichlet(a,,a)\phi = (\phi_1, \ldots, \phi_n) \sim \mathrm{Dirichlet}(a, \ldots, a)4 in the region via ϕ=(ϕ1,,ϕn)Dirichlet(a,,a)\phi = (\phi_1, \ldots, \phi_n) \sim \mathrm{Dirichlet}(a, \ldots, a)5 minimization or penalization;
  • Hyperparameters (e.g., Dirichlet ϕ=(ϕ1,,ϕn)Dirichlet(a,,a)\phi = (\phi_1, \ldots, \phi_n) \sim \mathrm{Dirichlet}(a, \ldots, a)6) can be objectively tuned by minimizing discrepancy between induced ϕ=(ϕ1,,ϕn)Dirichlet(a,,a)\phi = (\phi_1, \ldots, \phi_n) \sim \mathrm{Dirichlet}(a, \ldots, a)7 distribution and a target Beta prior (Zhang et al., 2016).

Simulation and microarray analyses confirm that DL priors outperform Bayesian Lasso, Laplace, and horseshoe alternatives in terms of mean squared error, selection support recovery, and interpretability, especially in settings where signals are sparse and correlations moderate to high (Bhattacharya et al., 2014, Zhang et al., 2016).

6. Practical Recommendations and Implementation Notes

For efficient and correct computation:

  • Always use the corrected Gibbs sampler or the simplified ϕ=(ϕ1,,ϕn)Dirichlet(a,,a)\phi = (\phi_1, \ldots, \phi_n) \sim \mathrm{Dirichlet}(a, \ldots, a)8-based scheme to correctly target the posterior (Gruber et al., 16 Aug 2025, Onorati et al., 7 Jul 2025).
  • Employ specialized C/C++ routines for GIG and inverse-Gaussian draws.
  • Monitor mixing of global shrinkage parameters; consider re-parameterizations or thinning to improve autocorrelations.
  • For ϕ=(ϕ1,,ϕn)Dirichlet(a,,a)\phi = (\phi_1, \ldots, \phi_n) \sim \mathrm{Dirichlet}(a, \ldots, a)9, e.g., τGamma(na,1/2)\tau \sim \mathrm{Gamma}(n a, 1/2)0, numerical instabilities may arise; the alternative τGamma(na,1/2)\tau \sim \mathrm{Gamma}(n a, 1/2)1-parametrization tends to be more robust.
  • Parallelize coordinatewise updates for large τGamma(na,1/2)\tau \sim \mathrm{Gamma}(n a, 1/2)2.
  • Validate implementation by “Getting it right” tests or by comparison with small-τGamma(na,1/2)\tau \sim \mathrm{Gamma}(n a, 1/2)3 marginal samplers.

An R package “DirLapl” implements the corrected sampler (Onorati et al., 7 Jul 2025).

7. Impact and Current Best Practices

DL priors blend computational tractability, theoretical optimality under sparsity, and flexibility via global–local architecture. Best practices include setting τGamma(na,1/2)\tau \sim \mathrm{Gamma}(n a, 1/2)4 or τGamma(na,1/2)\tau \sim \mathrm{Gamma}(n a, 1/2)5, using gamma global scales for closed-form updates, and validating MCMC chains against the target posterior (Bhattacharya et al., 2014, Gruber et al., 16 Aug 2025).

When embedded in regression or penalized credible region frameworks, DL priors yield consistent support recovery, improved prediction error, and are applicable to massive-dimensional data, including genetics and microarray analyses (Zhang et al., 2016). Theoretical results and corrected algorithms ensure reliability of uncertainty quantification and practical recommendations for robust variable selection.

A plausible implication is that, owing to the spike-at-zero and heavy tails, DL priors should be preferred in high-dimensional sparse modeling where both accurate zero selection and minimal shrinkage of true signals are required. All substantive theoretical properties and large-sample guarantees remain valid under corrected samplers (Gruber et al., 16 Aug 2025, Onorati et al., 7 Jul 2025).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Dirichlet–Laplace Priors.