Dirichlet–Laplace Priors in Sparse Inference
- Dirichlet–Laplace priors are global–local shrinkage methods designed for high-dimensional sparse Bayesian inference, combining simplex-constrained weights with Laplace mixtures.
- They achieve optimal posterior contraction rates and robust variable selection in regression and normal means models via conjugate-friendly Gibbs sampling.
- Practical implementations use corrected sampling algorithms and alternative parameterizations to enhance computational efficiency and accurate uncertainty quantification.
The Dirichlet–Laplace (DL) prior is a global–local shrinkage prior designed for high-dimensional Bayesian inference under sparsity. It combines a simplex-constrained vector of local scales (Dirichlet weights) with Laplace or Gaussian scale mixtures, producing pronounced shrinkage near zero while retaining heavy tails for large signals. The DL prior delivers optimal posterior contraction rates, efficient computation via conjugate-friendly Gibbs sampling, and robust variable selection in regression and normal means models. Theoretical advances and practical implementation strategies clarify its analytic properties and correct subtle errors in earlier simulation algorithms.
1. Hierarchical Model and Marginal Representation
Consider the normal means model, , , for . The DL prior introduces three layers of latent variables:
- Local scale parameters: for .
- Simplex-constrained weights: .
- Global scale: .
The hierarchical prior for is:
Integrating over produces double-exponential (Laplace) marginals. Further marginalization over and yields a density for sharply peaked at zero and with exponential or polynomial tails, depending on and the integration over Dirichlet and Gamma components (Bhattacharya et al., 2014, Bhattacharya et al., 2012).
An alternative parameterization replaces Dirichlet and Gamma with independent Gamma variables: letting , , so
This version facilitates implementation and eliminates redundancy (Gruber et al., 16 Aug 2025).
2. Global–Local Shrinkage Mechanism
DL priors effect shrinkage by combining a sum-to-one constraint on local weights (the simplex) with a global scale. This architecture produces "joint shrinkage", suppressing all but a few signals via the spike at zero. For very sparse signals (e.g., under ), the DL prior matches the concentration properties of discrete spike-and-slab mixtures, yet retains computational tractability (Bhattacharya et al., 2012).
Comparative properties:
- Laplace (Bayesian Lasso): Exponential peak at zero, lighter tails — tends to over-shrink large coefficients.
- Horseshoe: Heavy tails () but weaker central spike.
- Dirichlet–Laplace: Adjustable spike via (small gives near point-mass), exponential-to-Pareto tails (Zhang et al., 2016).
3. Theoretical Guarantees and Posterior Concentration
DL priors achieve minimax posterior contraction rates in sparse high-dimensional settings. In the normal means model, with true nonzero coordinates in , set . Under (typically ), the posterior contracts at rate :
provided (Bhattacharya et al., 2014). The induced support size for remains proportional to , avoiding over-selection of spurious features.
In linear regression (), consistent posterior contraction and selection hold under and suitable shrinking (Zhang et al., 2016).
4. Posterior Computation: Gibbs Samplers and Algorithmic Corrections
DL posterior draws are computable via blocked Gibbs sampling exploiting the scale-mixture Gaussian and normalized random measure identities. Key update steps:
- update: Jointly normal, conditionally independent given or .
- Local scale update: sampled as inverse-Gaussian.
- Global scale update: as generalized inverse-Gaussian.
- Simplex weights update: via normalized random measure draws, i.e., , (Bhattacharya et al., 2012, Gruber et al., 16 Aug 2025).
An error in the original Bhattacharya et al. (2015) sampler concerned the order of updates: sampling before , and using conditionals that had already integrated out , violates the marginal-conditional factorization and leads to incorrect stationary distributions (Gruber et al., 16 Aug 2025, Onorati et al., 7 Jul 2025). Corrected algorithms respect the sequence within blocks (or use the parameterization for independent updates).
Per iteration cost is , dominated by fast inverse-Gaussian and GIG draws. Mixing is efficient, but global shrinkage parameters (e.g., , ) require diagnostic monitoring.
5. Applications in Regression, Variable Selection, and Performance
The DL prior supports high-dimensional linear regression, sparse estimation, and variable selection tasks. In regression,
assigns each coefficient a DL prior, facilitating sparse recovery via penalized credible regions:
- Compute joint posterior credible ellipsoids;
- Find the sparsest in the region via minimization or penalization;
- Hyperparameters (e.g., Dirichlet ) can be objectively tuned by minimizing discrepancy between induced distribution and a target Beta prior (Zhang et al., 2016).
Simulation and microarray analyses confirm that DL priors outperform Bayesian Lasso, Laplace, and horseshoe alternatives in terms of mean squared error, selection support recovery, and interpretability, especially in settings where signals are sparse and correlations moderate to high (Bhattacharya et al., 2014, Zhang et al., 2016).
6. Practical Recommendations and Implementation Notes
For efficient and correct computation:
- Always use the corrected Gibbs sampler or the simplified -based scheme to correctly target the posterior (Gruber et al., 16 Aug 2025, Onorati et al., 7 Jul 2025).
- Employ specialized C/C++ routines for GIG and inverse-Gaussian draws.
- Monitor mixing of global shrinkage parameters; consider re-parameterizations or thinning to improve autocorrelations.
- For , e.g., , numerical instabilities may arise; the alternative -parametrization tends to be more robust.
- Parallelize coordinatewise updates for large .
- Validate implementation by “Getting it right” tests or by comparison with small- marginal samplers.
An R package “DirLapl” implements the corrected sampler (Onorati et al., 7 Jul 2025).
7. Impact and Current Best Practices
DL priors blend computational tractability, theoretical optimality under sparsity, and flexibility via global–local architecture. Best practices include setting or $1/p$, using gamma global scales for closed-form updates, and validating MCMC chains against the target posterior (Bhattacharya et al., 2014, Gruber et al., 16 Aug 2025).
When embedded in regression or penalized credible region frameworks, DL priors yield consistent support recovery, improved prediction error, and are applicable to massive-dimensional data, including genetics and microarray analyses (Zhang et al., 2016). Theoretical results and corrected algorithms ensure reliability of uncertainty quantification and practical recommendations for robust variable selection.
A plausible implication is that, owing to the spike-at-zero and heavy tails, DL priors should be preferred in high-dimensional sparse modeling where both accurate zero selection and minimal shrinkage of true signals are required. All substantive theoretical properties and large-sample guarantees remain valid under corrected samplers (Gruber et al., 16 Aug 2025, Onorati et al., 7 Jul 2025).