Heavy-Tailed Stable Priors in Bayesian Modeling
- Heavy-tailed stable priors are probability measures based on α-stable distributions with slow polynomial tail decay that ensure robust modeling in noisy environments.
- They are constructed via random series expansions with infinite divisibility and Lévy-Khintchine formulations to guarantee convergence in infinite-dimensional spaces.
- Their applications span Bayesian inverse problems, high-dimensional regression, and deep learning, providing effective sparsity-inducing mechanisms and edge-preserving inference.
Heavy-tailed stable priors are probability measures—either finite- or infinite-dimensional—centered on the class of stable distributions with heavy polynomial tails, including the Cauchy and general α-stable laws. Such priors are characterized by their slow polynomial decay at infinity, leading to infinite variance or the absence of higher moments in many cases. They are of central importance in robust Bayesian modeling, high-dimensional feature selection, nonparametric regression, Bayesian inverse problems, and generative modeling, particularly in settings where underlying signals or data distributions exhibit heavy-tailed behavior. Their rigorous construction and statistical properties have been developed on both finite- and infinite-dimensional function spaces, with mathematical formulations using Lévy-Khintchine and Karhunen–Loève-type expansions.
1. Definition and Properties of Heavy-Tailed Stable Priors
A stable probability law of index is defined via its characteristic function. For a random variable ,
$\E\big[e^{i t u}\big] = \exp\left(i \delta t - |\gamma t|^\alpha\left[1 + i\beta \tan\left(\frac{\pi\alpha}{2}\right) (\mathrm{sign}\, t) (|t|^{1-\alpha} - 1)\right] \right),$
with explicit modifications for (Cauchy) (Sullivan, 2016). The tails satisfy
and only moments of order exist: $\E[|u|^p] < \infty \iff p < \alpha.$
Infinite-dimensional stable priors are constructed by random series expansions: where is a normalized unconditional basis in a (quasi-)Banach space , and coefficient decay must satisfy 0, 1 for convergence in norm (Sullivan, 2016).
Stable priors are infinitely divisible (ID), falling within the ID framework where the Lévy measure 2 at infinity, ensuring that the tails of the law remain heavy and may exhibit infinite variance or even heavy power-law decay for the Lévy process paths (Hosseini, 2016).
2. Construction of Heavy-Tailed Stable Priors
Stable priors in infinite dimensions are realized by leveraging random series representations analogous to Karhunen-Loève expansions. For a normalized unconditional Schauder basis 3 with a 4-frame upper bound: 5 and 6, draw i.i.d.\ coefficients 7 from 8 (Sullivan, 2016). If 9, $\E\big[e^{i t u}\big] = \exp\left(i \delta t - |\gamma t|^\alpha\left[1 + i\beta \tan\left(\frac{\pi\alpha}{2}\right) (\mathrm{sign}\, t) (|t|^{1-\alpha} - 1)\right] \right),$0, almost sure convergence in $\E\big[e^{i t u}\big] = \exp\left(i \delta t - |\gamma t|^\alpha\left[1 + i\beta \tan\left(\frac{\pi\alpha}{2}\right) (\mathrm{sign}\, t) (|t|^{1-\alpha} - 1)\right] \right),$1 follows.
For heavy-tailed infinitely divisible (ID) priors motivated by $\E\big[e^{i t u}\big] = \exp\left(i \delta t - |\gamma t|^\alpha\left[1 + i\beta \tan\left(\frac{\pi\alpha}{2}\right) (\mathrm{sign}\, t) (|t|^{1-\alpha} - 1)\right] \right),$2 regularization ($\E\big[e^{i t u}\big] = \exp\left(i \delta t - |\gamma t|^\alpha\left[1 + i\beta \tan\left(\frac{\pi\alpha}{2}\right) (\mathrm{sign}\, t) (|t|^{1-\alpha} - 1)\right] \right),$3), densities of the form
$\E\big[e^{i t u}\big] = \exp\left(i \delta t - |\gamma t|^\alpha\left[1 + i\beta \tan\left(\frac{\pi\alpha}{2}\right) (\mathrm{sign}\, t) (|t|^{1-\alpha} - 1)\right] \right),$4
exhibit sub-exponential decay for $\E\big[e^{i t u}\big] = \exp\left(i \delta t - |\gamma t|^\alpha\left[1 + i\beta \tan\left(\frac{\pi\alpha}{2}\right) (\mathrm{sign}\, t) (|t|^{1-\alpha} - 1)\right] \right),$5, are infinitely divisible, and induce sparse, non-convex regularization in the prior (Hosseini, 2016). These constructions generalize to product priors on Banach spaces and extend naturally to pure-jump Lévy process priors on $\E\big[e^{i t u}\big] = \exp\left(i \delta t - |\gamma t|^\alpha\left[1 + i\beta \tan\left(\frac{\pi\alpha}{2}\right) (\mathrm{sign}\, t) (|t|^{1-\alpha} - 1)\right] \right),$6, where sample paths are piecewise-constant, edge-preserving analogues of total variation priors (Hosseini, 2016).
3. Well-Posedness and Regularity of Posteriors
The existence and uniqueness of posteriors under heavy-tailed stable priors are ensured under significantly weaker regularity conditions than for Gaussians. For a misfit $\E\big[e^{i t u}\big] = \exp\left(i \delta t - |\gamma t|^\alpha\left[1 + i\beta \tan\left(\frac{\pi\alpha}{2}\right) (\mathrm{sign}\, t) (|t|^{1-\alpha} - 1)\right] \right),$7 (negative log likelihood), sufficient conditions for well-posedness include a lower bound: $\E\big[e^{i t u}\big] = \exp\left(i \delta t - |\gamma t|^\alpha\left[1 + i\beta \tan\left(\frac{\pi\alpha}{2}\right) (\mathrm{sign}\, t) (|t|^{1-\alpha} - 1)\right] \right),$8 and $\E\big[e^{i t u}\big] = \exp\left(i \delta t - |\gamma t|^\alpha\left[1 + i\beta \tan\left(\frac{\pi\alpha}{2}\right) (\mathrm{sign}\, t) (|t|^{1-\alpha} - 1)\right] \right),$9-Lipschitz and likelihood perturbation controls via the growth rate in the tails (Sullivan, 2016, Hosseini, 2016).
Unlike Gaussian priors (0) which permit quadratic growth, stable priors (1) have only polynomial-logarithmic integrability and thus can handle likelihoods that are only logarithmically decaying (or mildly diverging) at infinity, which is critical in ill-posed inverse problems or structured sparsity regimes (Sullivan, 2016).
For 2-stable priors, posteriors are always well-defined under additive Gaussian noise, as only 3 is required for normalization (Hosseini, 2016). However, total-variation or Hellinger stability of the posterior (i.e., Lipschitz dependence on data) is ruled out for stable laws with 4, since they lack finite second moments (Hosseini, 2016).
4. Statistical and Computational Behavior
Key properties include:
- Robustness and Sparsity: Heavy-tailed priors (e.g., Cauchy, 5-stable) shrink noise coefficients strongly toward zero (posterior concentrates near the origin) but leave large, data-supported coefficients unshrunk due to their slow decay (Jiang et al., 2016).
- Sparsity-Inducing Mechanisms: Priors like the Student-6 (Cauchy) or log-7 priors induce ultra-heavy tails and sharp spikes at zero, enabling effective feature selection and nonconvex penalty landscapes in high-dimensional regression (Schmidt et al., 2018, Jiang et al., 2016).
- Sampling: Practical implementation is based on truncation of the random series expansion (e.g., up to 8 terms where 9 decays), with posterior inference via MCMC (e.g., Metropolis-Hastings, HMC, restricted Gibbs) (Sullivan, 2016, Jiang et al., 2016).
- Numerical Considerations: Heavy tails may induce occasional very large draws, requiring careful proposal scaling or forward solvers tolerant to parameter excursions (Sullivan, 2016).
5. Applications in Inverse Problems, Regression, and Machine Learning
Bayesian Inverse Problems
In infinite-dimensional inverse problems (e.g., PDE inversion), heavy-tailed stable priors enable the recovery of signals with jump discontinuities or sharp features. Sampling is performed via random series expansion with 0-stable variables, and posterior well-posedness is established via compatibility between prior tail decay and likelihood regularity (Sullivan, 2016, Hosseini, 2016).
High-Dimensional Regression and Feature Selection
Student-1 (in particular Cauchy: 2) and ultra-heavy-tailed log-scale priors enable robust feature selection and sparse estimation in settings such as gene selection from microarray data. They outperform traditional LASSO and group-LASSO methods by more aggressively discarding irrelevant coefficients and automatically selecting representatives in correlated groups (Li et al., 2013, Jiang et al., 2016, Schmidt et al., 2018).
Nonparametric and Functional Regression
Heavy-tailed priors with "oversmoothed" scaling (OT priors) on orthonormal basis coefficients achieve near-optimal minimax posterior contraction rates in Sobolev and Besov function spaces over a wide range of 3 losses, including both regular and sparse regimes. The scaling
4
is necessary for adaptation across unknown smoothness; polynomial scalings are insufficient (Agapiou et al., 21 May 2025).
Bayesian Deep Learning and Generative Modeling
Recent developments use heavy-tailed stable or Student-5 priors on neural network weights to allow flexible function approximation without explicit model selection over the architecture. Such priors ensure sufficient mass on large weight values, enabling DNNs to achieve minimax rates for compositional Hölder or Besov classes, and improve robustness in generative modeling tasks (e.g., via flow-matching in convex domains) (Castillo et al., 2024, Guan et al., 10 Oct 2025).
6. Robust Estimation, Practical Recommendations, and Limitations
In robust inference for low-data or poorly identified regimes (e.g., astronomical distance estimation from parallaxes), half-Cauchy and product half-Cauchy priors (with tails 6 and 7) provide robustness against likelihood explosion, delay the growth of posterior risk, and are easily implemented with a single scale hyperparameter (Ghosh et al., 2024).
However, the "curse of a single observation" ensures that the prior cannot ultimately prevent quadratic risk from diverging as prior information vanishes; heavy tails only delay this divergence (Ghosh et al., 2024). In Bayesian inverse problems, 8-stable priors guarantee only existence of the posterior, not TV or Hellinger stability when 9 (Hosseini, 2016).
7. Theoretical Summary and Outlook
Heavy-tailed stable priors, constructed via 0-stable, Lévy, or related infinitely divisible measures, offer a principled pathway to robust, adaptive, and edge-preserving Bayesian inference in high- and infinite-dimensional spaces. They require careful tuning of coefficient scaling to guarantee minimax adaptation, and their use is underpinned by a precise balance between prior tail decay and likelihood growth regularities. They subsume various specific classes (e.g., Student-1, half-Cauchy, product half-Cauchy, horseshoe) and provide a flexible toolbox for applications spanning regression, signal processing, modern machine learning, and uncertainty quantification in inverse problems (Sullivan, 2016, Hosseini, 2016, Agapiou et al., 21 May 2025, Ghosh et al., 2024, Castillo et al., 2024).