Papers
Topics
Authors
Recent
Search
2000 character limit reached

Dirichlet–Laplace Priors in Sparse Inference

Updated 13 January 2026
  • Dirichlet–Laplace priors are global–local shrinkage methods designed for high-dimensional sparse Bayesian inference, combining simplex-constrained weights with Laplace mixtures.
  • They achieve optimal posterior contraction rates and robust variable selection in regression and normal means models via conjugate-friendly Gibbs sampling.
  • Practical implementations use corrected sampling algorithms and alternative parameterizations to enhance computational efficiency and accurate uncertainty quantification.

The Dirichlet–Laplace (DL) prior is a global–local shrinkage prior designed for high-dimensional Bayesian inference under sparsity. It combines a simplex-constrained vector of local scales (Dirichlet weights) with Laplace or Gaussian scale mixtures, producing pronounced shrinkage near zero while retaining heavy tails for large signals. The DL prior delivers optimal posterior contraction rates, efficient computation via conjugate-friendly Gibbs sampling, and robust variable selection in regression and normal means models. Theoretical advances and practical implementation strategies clarify its analytic properties and correct subtle errors in earlier simulation algorithms.

1. Hierarchical Model and Marginal Representation

Consider the normal means model, yi=θi+εiy_i = \theta_i + \varepsilon_i, εiN(0,1)\varepsilon_i \sim N(0,1), for i=1,,ni = 1, \ldots, n. The DL prior introduces three layers of latent variables:

  • Local scale parameters: ψjExp(1/2)\psi_j \sim \mathrm{Exp}(1/2) for j=1,,nj = 1, \ldots, n.
  • Simplex-constrained weights: ϕ=(ϕ1,,ϕn)Dirichlet(a,,a)\phi = (\phi_1, \ldots, \phi_n) \sim \mathrm{Dirichlet}(a, \ldots, a).
  • Global scale: τGamma(na,1/2)\tau \sim \mathrm{Gamma}(n a, 1/2).

The hierarchical prior for θj\theta_j is:

θjψj,ϕj,τN(0,ψjϕj2τ2).\theta_j | \psi_j, \phi_j, \tau \sim N(0, \psi_j \phi_j^2 \tau^2).

Integrating over ψj\psi_j produces DE(0,ϕjτ)DE(0, \phi_j \tau) double-exponential (Laplace) marginals. Further marginalization over ϕ\phi and τ\tau yields a density for θj\theta_j sharply peaked at zero and with exponential or polynomial tails, depending on aa and the integration over Dirichlet and Gamma components (Bhattacharya et al., 2014, Bhattacharya et al., 2012).

An alternative parameterization replaces Dirichlet and Gamma with independent Gamma variables: letting λj=ϕjτ\lambda_j = \phi_j \tau, λjGamma(a,1/2)\lambda_j \sim \mathrm{Gamma}(a, 1/2), so

θjψj,λjN(0,ψjλj2).\theta_j | \psi_j, \lambda_j \sim N(0, \psi_j \lambda_j^2).

This version facilitates implementation and eliminates redundancy (Gruber et al., 16 Aug 2025).

2. Global–Local Shrinkage Mechanism

DL priors effect shrinkage by combining a sum-to-one constraint on local weights (the simplex) with a global scale. This architecture produces "joint shrinkage", suppressing all but a few signals via the spike at zero. For very sparse signals (e.g., under a=1/na = 1/n), the DL prior matches the concentration properties of discrete spike-and-slab mixtures, yet retains computational tractability (Bhattacharya et al., 2012).

Comparative properties:

  • Laplace (Bayesian Lasso): Exponential peak at zero, lighter tails — tends to over-shrink large coefficients.
  • Horseshoe: Heavy tails (1/β21/\beta^2) but weaker central spike.
  • Dirichlet–Laplace: Adjustable spike via aa (small aa gives near point-mass), exponential-to-Pareto tails (Zhang et al., 2016).

3. Theoretical Guarantees and Posterior Concentration

DL priors achieve minimax posterior contraction rates in sparse high-dimensional settings. In the normal means model, with qnq_n true nonzero coordinates in θ0\theta^0, set sn=qnlog(n/qn)s_n = q_n \log(n/q_n). Under an=n(1+δ)a_n = n^{-(1+\delta)} (typically a=1/na = 1/n), the posterior contracts at rate sns_n:

Eθ0[Π(θθ022Msny)]1as n,\mathbb{E}_{\theta^0} \left[ \Pi\left( \|\theta - \theta^0\|_2^2 \leq M s_n \mid y \right) \right] \rightarrow 1 \quad \text{as} \ n \rightarrow \infty,

provided θ022qn(logn)4\|\theta^0\|_2^2 \leq q_n (\log n)^4 (Bhattacharya et al., 2014). The induced support size suppϵn(θ)|\mathrm{supp}_{\epsilon_n}(\theta)| for ϵn=sn/n\epsilon_n = s_n/n remains proportional to qnq_n, avoiding over-selection of spurious features.

In linear regression (pnp_n \to \infty), consistent posterior contraction and selection hold under qn=o(n/logn)q_n = o(n/\log n) and suitable shrinking ana_n (Zhang et al., 2016).

4. Posterior Computation: Gibbs Samplers and Algorithmic Corrections

DL posterior draws are computable via blocked Gibbs sampling exploiting the scale-mixture Gaussian and normalized random measure identities. Key update steps:

  • θ\theta update: Jointly normal, conditionally independent given ψj,ϕj,τ\psi_j,\phi_j,\tau or ψj,λj\psi_j,\lambda_j.
  • Local scale update: ψjθj,ϕj,τ\psi_j \mid \theta_j, \phi_j, \tau sampled as inverse-Gaussian.
  • Global scale update: τϕ,θ\tau \mid \phi,\theta as generalized inverse-Gaussian.
  • Simplex weights update: ϕj\phi_j via normalized random measure draws, i.e., TjgiG(a1,1,2θj)T_j \sim \mathrm{giG}(a-1, 1, 2|\theta_j|), ϕj=Tj/kTk\phi_j = T_j/\sum_k T_k (Bhattacharya et al., 2012, Gruber et al., 16 Aug 2025).

An error in the original Bhattacharya et al. (2015) sampler concerned the order of updates: sampling ψ\psi before (ϕ,τ)(\phi, \tau), and using conditionals that had already integrated out ψ\psi, violates the marginal-conditional factorization and leads to incorrect stationary distributions (Gruber et al., 16 Aug 2025, Onorati et al., 7 Jul 2025). Corrected algorithms respect the sequence ϕτψ\phi \rightarrow \tau \rightarrow \psi within blocks (or use the λj\lambda_j parameterization for independent updates).

Per iteration cost is O(n)O(n), dominated by fast inverse-Gaussian and GIG draws. Mixing is efficient, but global shrinkage parameters (e.g., τ\tau, λj\sum \lambda_j) require diagnostic monitoring.

5. Applications in Regression, Variable Selection, and Performance

The DL prior supports high-dimensional linear regression, sparse estimation, and variable selection tasks. In regression,

Y=Xβ+ε,εN(0,σ2In)Y = X\beta + \varepsilon, \quad \varepsilon \sim N(0, \sigma^2 I_n)

assigns each coefficient βj\beta_j a DL prior, facilitating sparse recovery via penalized credible regions:

  • Compute joint posterior credible ellipsoids;
  • Find the sparsest β\beta in the region via L0L_0 minimization or penalization;
  • Hyperparameters (e.g., Dirichlet aa) can be objectively tuned by minimizing discrepancy between induced R2R^2 distribution and a target Beta prior (Zhang et al., 2016).

Simulation and microarray analyses confirm that DL priors outperform Bayesian Lasso, Laplace, and horseshoe alternatives in terms of mean squared error, selection support recovery, and interpretability, especially in settings where signals are sparse and correlations moderate to high (Bhattacharya et al., 2014, Zhang et al., 2016).

6. Practical Recommendations and Implementation Notes

For efficient and correct computation:

  • Always use the corrected Gibbs sampler or the simplified λ\lambda-based scheme to correctly target the posterior (Gruber et al., 16 Aug 2025, Onorati et al., 7 Jul 2025).
  • Employ specialized C/C++ routines for GIG and inverse-Gaussian draws.
  • Monitor mixing of global shrinkage parameters; consider re-parameterizations or thinning to improve autocorrelations.
  • For a1a \ll 1, e.g., a=1/na = 1/n, numerical instabilities may arise; the alternative λ\lambda-parametrization tends to be more robust.
  • Parallelize coordinatewise updates for large nn.
  • Validate implementation by “Getting it right” tests or by comparison with small-nn marginal samplers.

An R package “DirLapl” implements the corrected sampler (Onorati et al., 7 Jul 2025).

7. Impact and Current Best Practices

DL priors blend computational tractability, theoretical optimality under sparsity, and flexibility via global–local architecture. Best practices include setting a=1/na = 1/n or $1/p$, using gamma global scales for closed-form updates, and validating MCMC chains against the target posterior (Bhattacharya et al., 2014, Gruber et al., 16 Aug 2025).

When embedded in regression or penalized credible region frameworks, DL priors yield consistent support recovery, improved prediction error, and are applicable to massive-dimensional data, including genetics and microarray analyses (Zhang et al., 2016). Theoretical results and corrected algorithms ensure reliability of uncertainty quantification and practical recommendations for robust variable selection.

A plausible implication is that, owing to the spike-at-zero and heavy tails, DL priors should be preferred in high-dimensional sparse modeling where both accurate zero selection and minimal shrinkage of true signals are required. All substantive theoretical properties and large-sample guarantees remain valid under corrected samplers (Gruber et al., 16 Aug 2025, Onorati et al., 7 Jul 2025).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Dirichlet–Laplace Priors.