Dirichlet–Laplace Priors in Sparse Inference

Updated 13 January 2026

Dirichlet–Laplace priors are global–local shrinkage methods designed for high-dimensional sparse Bayesian inference, combining simplex-constrained weights with Laplace mixtures.
They achieve optimal posterior contraction rates and robust variable selection in regression and normal means models via conjugate-friendly Gibbs sampling.
Practical implementations use corrected sampling algorithms and alternative parameterizations to enhance computational efficiency and accurate uncertainty quantification.

The Dirichlet–Laplace (DL) prior is a global–local shrinkage prior designed for high-dimensional Bayesian inference under sparsity. It combines a simplex-constrained vector of local scales (Dirichlet weights) with Laplace or Gaussian scale mixtures, producing pronounced shrinkage near zero while retaining heavy tails for large signals. The DL prior delivers optimal posterior contraction rates, efficient computation via conjugate-friendly Gibbs sampling, and robust variable selection in regression and normal means models. Theoretical advances and practical implementation strategies clarify its analytic properties and correct subtle errors in earlier simulation algorithms.

1. Hierarchical Model and Marginal Representation

Consider the normal means model, $y_i = \theta_i + \varepsilon_i$ , $\varepsilon_i \sim N(0,1)$ , for $i = 1, \ldots, n$ . The DL prior introduces three layers of latent variables:

Local scale parameters: $\psi_j \sim \mathrm{Exp}(1/2)$ for $j = 1, \ldots, n$ .
Simplex-constrained weights: $\phi = (\phi_1, \ldots, \phi_n) \sim \mathrm{Dirichlet}(a, \ldots, a)$ .
Global scale: $\tau \sim \mathrm{Gamma}(n a, 1/2)$ .

The hierarchical prior for $\theta_j$ is:

$\theta_j | \psi_j, \phi_j, \tau \sim N(0, \psi_j \phi_j^2 \tau^2).$

Integrating over $\psi_j$ produces $\varepsilon_i \sim N(0,1)$ 0 double-exponential (Laplace) marginals. Further marginalization over $\varepsilon_i \sim N(0,1)$ 1 and $\varepsilon_i \sim N(0,1)$ 2 yields a density for $\varepsilon_i \sim N(0,1)$ 3 sharply peaked at zero and with exponential or polynomial tails, depending on $\varepsilon_i \sim N(0,1)$ 4 and the integration over Dirichlet and Gamma components (Bhattacharya et al., 2014, Bhattacharya et al., 2012).

An alternative parameterization replaces Dirichlet and Gamma with independent Gamma variables: letting $\varepsilon_i \sim N(0,1)$ 5, $\varepsilon_i \sim N(0,1)$ 6, so

$\varepsilon_i \sim N(0,1)$ 7

This version facilitates implementation and eliminates redundancy (Gruber et al., 16 Aug 2025).

2. Global–Local Shrinkage Mechanism

DL priors effect shrinkage by combining a sum-to-one constraint on local weights (the simplex) with a global scale. This architecture produces "joint shrinkage", suppressing all but a few signals via the spike at zero. For very sparse signals (e.g., under $\varepsilon_i \sim N(0,1)$ 8), the DL prior matches the concentration properties of discrete spike-and-slab mixtures, yet retains computational tractability (Bhattacharya et al., 2012).

Comparative properties:

Laplace (Bayesian Lasso): Exponential peak at zero, lighter tails — tends to over-shrink large coefficients.
Horseshoe: Heavy tails ( $\varepsilon_i \sim N(0,1)$ 9) but weaker central spike.
Dirichlet–Laplace: Adjustable spike via $i = 1, \ldots, n$ 0 (small $i = 1, \ldots, n$ 1 gives near point-mass), exponential-to-Pareto tails (Zhang et al., 2016).

3. Theoretical Guarantees and Posterior Concentration

DL priors achieve minimax posterior contraction rates in sparse high-dimensional settings. In the normal means model, with $i = 1, \ldots, n$ 2 true nonzero coordinates in $i = 1, \ldots, n$ 3, set $i = 1, \ldots, n$ 4. Under $i = 1, \ldots, n$ 5 (typically $i = 1, \ldots, n$ 6), the posterior contracts at rate $i = 1, \ldots, n$ 7:

$i = 1, \ldots, n$ 8

provided $i = 1, \ldots, n$ 9 (Bhattacharya et al., 2014). The induced support size $\psi_j \sim \mathrm{Exp}(1/2)$ 0 for $\psi_j \sim \mathrm{Exp}(1/2)$ 1 remains proportional to $\psi_j \sim \mathrm{Exp}(1/2)$ 2, avoiding over-selection of spurious features.

In linear regression ( $\psi_j \sim \mathrm{Exp}(1/2)$ 3), consistent posterior contraction and selection hold under $\psi_j \sim \mathrm{Exp}(1/2)$ 4 and suitable shrinking $\psi_j \sim \mathrm{Exp}(1/2)$ 5 (Zhang et al., 2016).

4. Posterior Computation: Gibbs Samplers and Algorithmic Corrections

DL posterior draws are computable via blocked Gibbs sampling exploiting the scale-mixture Gaussian and normalized random measure identities. Key update steps:

$\psi_j \sim \mathrm{Exp}(1/2)$ 6 update: Jointly normal, conditionally independent given $\psi_j \sim \mathrm{Exp}(1/2)$ 7 or $\psi_j \sim \mathrm{Exp}(1/2)$ 8.
Local scale update: $\psi_j \sim \mathrm{Exp}(1/2)$ 9 sampled as inverse-Gaussian.
Global scale update: $j = 1, \ldots, n$ 0 as generalized inverse-Gaussian.
Simplex weights update: $j = 1, \ldots, n$ 1 via normalized random measure draws, i.e., $j = 1, \ldots, n$ 2, $j = 1, \ldots, n$ 3 (Bhattacharya et al., 2012, Gruber et al., 16 Aug 2025).

An error in the original Bhattacharya et al. (2015) sampler concerned the order of updates: sampling $j = 1, \ldots, n$ 4 before $j = 1, \ldots, n$ 5, and using conditionals that had already integrated out $j = 1, \ldots, n$ 6, violates the marginal-conditional factorization and leads to incorrect stationary distributions (Gruber et al., 16 Aug 2025, Onorati et al., 7 Jul 2025). Corrected algorithms respect the sequence $j = 1, \ldots, n$ 7 within blocks (or use the $j = 1, \ldots, n$ 8 parameterization for independent updates).

Per iteration cost is $j = 1, \ldots, n$ 9, dominated by fast inverse-Gaussian and GIG draws. Mixing is efficient, but global shrinkage parameters (e.g., $\phi = (\phi_1, \ldots, \phi_n) \sim \mathrm{Dirichlet}(a, \ldots, a)$ 0, $\phi = (\phi_1, \ldots, \phi_n) \sim \mathrm{Dirichlet}(a, \ldots, a)$ 1) require diagnostic monitoring.

5. Applications in Regression, Variable Selection, and Performance

The DL prior supports high-dimensional linear regression, sparse estimation, and variable selection tasks. In regression,

$\phi = (\phi_1, \ldots, \phi_n) \sim \mathrm{Dirichlet}(a, \ldots, a)$ 2

assigns each coefficient $\phi = (\phi_1, \ldots, \phi_n) \sim \mathrm{Dirichlet}(a, \ldots, a)$ 3 a DL prior, facilitating sparse recovery via penalized credible regions:

Compute joint posterior credible ellipsoids;
Find the sparsest $\phi = (\phi_1, \ldots, \phi_n) \sim \mathrm{Dirichlet}(a, \ldots, a)$ 4 in the region via $\phi = (\phi_1, \ldots, \phi_n) \sim \mathrm{Dirichlet}(a, \ldots, a)$ 5 minimization or penalization;
Hyperparameters (e.g., Dirichlet $\phi = (\phi_1, \ldots, \phi_n) \sim \mathrm{Dirichlet}(a, \ldots, a)$ 6) can be objectively tuned by minimizing discrepancy between induced $\phi = (\phi_1, \ldots, \phi_n) \sim \mathrm{Dirichlet}(a, \ldots, a)$ 7 distribution and a target Beta prior (Zhang et al., 2016).

Simulation and microarray analyses confirm that DL priors outperform Bayesian Lasso, Laplace, and horseshoe alternatives in terms of mean squared error, selection support recovery, and interpretability, especially in settings where signals are sparse and correlations moderate to high (Bhattacharya et al., 2014, Zhang et al., 2016).

6. Practical Recommendations and Implementation Notes

For efficient and correct computation:

Always use the corrected Gibbs sampler or the simplified $\phi = (\phi_1, \ldots, \phi_n) \sim \mathrm{Dirichlet}(a, \ldots, a)$ 8-based scheme to correctly target the posterior (Gruber et al., 16 Aug 2025, Onorati et al., 7 Jul 2025).
Employ specialized C/C++ routines for GIG and inverse-Gaussian draws.
Monitor mixing of global shrinkage parameters; consider re-parameterizations or thinning to improve autocorrelations.
For $\phi = (\phi_1, \ldots, \phi_n) \sim \mathrm{Dirichlet}(a, \ldots, a)$ 9, e.g., $\tau \sim \mathrm{Gamma}(n a, 1/2)$ 0, numerical instabilities may arise; the alternative $\tau \sim \mathrm{Gamma}(n a, 1/2)$ 1-parametrization tends to be more robust.
Parallelize coordinatewise updates for large $\tau \sim \mathrm{Gamma}(n a, 1/2)$ 2.
Validate implementation by “Getting it right” tests or by comparison with small- $\tau \sim \mathrm{Gamma}(n a, 1/2)$ 3 marginal samplers.

An R package “DirLapl” implements the corrected sampler (Onorati et al., 7 Jul 2025).

7. Impact and Current Best Practices

DL priors blend computational tractability, theoretical optimality under sparsity, and flexibility via global–local architecture. Best practices include setting $\tau \sim \mathrm{Gamma}(n a, 1/2)$ 4 or $\tau \sim \mathrm{Gamma}(n a, 1/2)$ 5, using gamma global scales for closed-form updates, and validating MCMC chains against the target posterior (Bhattacharya et al., 2014, Gruber et al., 16 Aug 2025).

When embedded in regression or penalized credible region frameworks, DL priors yield consistent support recovery, improved prediction error, and are applicable to massive-dimensional data, including genetics and microarray analyses (Zhang et al., 2016). Theoretical results and corrected algorithms ensure reliability of uncertainty quantification and practical recommendations for robust variable selection.

A plausible implication is that, owing to the spike-at-zero and heavy tails, DL priors should be preferred in high-dimensional sparse modeling where both accurate zero selection and minimal shrinkage of true signals are required. All substantive theoretical properties and large-sample guarantees remain valid under corrected samplers (Gruber et al., 16 Aug 2025, Onorati et al., 7 Jul 2025).

Markdown Report Issue Upgrade to Chat

References (5)

Dirichlet-Laplace priors for optimal shrinkage (2014)

Bayesian shrinkage (2012)

A note on simulation methods for the Dirichlet-Laplace prior (2025)

Variable selection via penalized credible regions with Dirichlet-Laplace global-local shrinkage priors (2016)

On the Posterior Computation Under the Dirichlet-Laplace Prior (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Dirichlet–Laplace Priors.

Dirichlet–Laplace Priors in Sparse Inference

1. Hierarchical Model and Marginal Representation

2. Global–Local Shrinkage Mechanism

3. Theoretical Guarantees and Posterior Concentration

4. Posterior Computation: Gibbs Samplers and Algorithmic Corrections

5. Applications in Regression, Variable Selection, and Performance

6. Practical Recommendations and Implementation Notes

7. Impact and Current Best Practices

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Dirichlet–Laplace Priors in Sparse Inference

1. Hierarchical Model and Marginal Representation

2. Global–Local Shrinkage Mechanism

3. Theoretical Guarantees and Posterior Concentration

4. Posterior Computation: Gibbs Samplers and Algorithmic Corrections

5. Applications in Regression, Variable Selection, and Performance

6. Practical Recommendations and Implementation Notes

7. Impact and Current Best Practices

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research