Log-Probability Variance Overview
- Log-probability variance is a metric that quantifies the dispersion of the logarithm of positive random variables and probability densities.
- It is used to assess multiplicative variability, enabling robust statistical estimation through geometric means and Bayesian bootstrap approaches.
- In variational inference, reducing log-probability variance leads to tighter bounds and more stable optimization of evidence lower bounds.
Log-probability variance quantifies the spread or dispersion of the logarithm of random variables, probability densities, or likelihood ratios. This concept arises naturally across probabilistic inference, estimation theory, statistical uncertainty analysis, and information theory—especially when data and models are defined on the positive real line and multiplicative effects or proportional errors dominate. Its operational definitions, estimation properties, and interpretative roles are fundamental in modern statistics, variational inference, and entropy calculations.
1. Formal Definitions
For a positive random variable , the log-probability variance or log-variance is defined as the variance of : $\operatorname{Var}_\log(X) := \operatorname{Var}(\log X) = \mathbb{E}\left[(\log X - \mathbb{E}[\log X])^2\right]$ The choice of logarithmic basis (natural or base-10 ) is context-dependent; in information theory and many statistical applications, the natural logarithm predominates.
In the context of probability densities, for a random variable with density , the variance of self-information is given by
which, for many important distributional families, is a function of the log-variance plus a shape-dependent constant (0705.4045).
For log-likelihood ratios in variational inference, let ; the log-probability variance is then (Richter et al., 2020, Huang et al., 2019).
2. Metric Structure and Geometric Interpretation
Log-probability variance emerges naturally from the logarithmic metric on 0: 1 For random positive vectors, a corresponding metric is
2
The minimizer of the expected squared logarithmic distance to a sample set is the geometric mean, and the corresponding minimal value is the log-variance (Gzyl, 2017). The logarithmic metric is fundamentally multiplicative, and the log-variance quantifies dispersion in multiplicative terms rather than additive ones.
3. Statistical Estimation, Uncertainty, and Bootstrap Methods
For small-sample, high-log-variance data, conventional uncertainty estimates based on the arithmetic mean and its standard error are inadequate, often producing unphysical negative lower confidence bounds. The appropriate approach involves transforming data to log space, where the log-variance is estimated as: 3 Bayesian bootstrap methods in log space, which assign Dirichlet4-distributed weights to each datum, yield credible intervals that are more robust than standard bootstrap intervals, especially when 5 and sample size 6 is modest. The Bayesian bootstrap avoids the extreme lower-limit bias of the standard bootstrap, which arises when resampled replicates omit rare but dominating large values (Mostofian et al., 2018).
It is recommended to always report the empirical 7, geometric mean, and Dirichlet-based credible intervals for multiplicative or highly dispersed data. However, the authors caution that neither bootstrap nor Bayesian bootstrap fully resolves the systematic underestimation of the mean when 8 is very small and 9 is large.
4. Log-probability Variance in Variational Inference
In variational inference, the variance of the log-likelihood ratio under the variational approximation,
$\operatorname{Var}_\log(X) := \operatorname{Var}(\log X) = \mathbb{E}\left[(\log X - \mathbb{E}[\log X])^2\right]$0
controls the tightness of the variational bound on $\operatorname{Var}_\log(X) := \operatorname{Var}(\log X) = \mathbb{E}\left[(\log X - \mathbb{E}[\log X])^2\right]$1. The so-called variational gap $\operatorname{Var}_\log(X) := \operatorname{Var}(\log X) = \mathbb{E}\left[(\log X - \mathbb{E}[\log X])^2\right]$2 is upper-bounded as
$\operatorname{Var}_\log(X) := \operatorname{Var}(\log X) = \mathbb{E}\left[(\log X - \mathbb{E}[\log X])^2\right]$3
Thus, variance-reduction techniques that concentrate the distribution of $\operatorname{Var}_\log(X) := \operatorname{Var}(\log X) = \mathbb{E}\left[(\log X - \mathbb{E}[\log X])^2\right]$4 or $\operatorname{Var}_\log(X) := \operatorname{Var}(\log X) = \mathbb{E}\left[(\log X - \mathbb{E}[\log X])^2\right]$5 (e.g., averaging, correlated sampling) directly tighten the bound on the variational bias (Huang et al., 2019).
The log-variance loss
$\operatorname{Var}_\log(X) := \operatorname{Var}(\log X) = \mathbb{E}\left[(\log X - \mathbb{E}[\log X])^2\right]$6
yields, upon differentiation, the gradient of the (negative) Evidence Lower Bound (ELBO), providing a variance-reduced estimator (VarGrad) that requires no explicit differentiation through Monte Carlo samples (Richter et al., 2020). This estimator achieves variance properties at least as good as, and in many settings better than, the REINFORCE score-function estimator, and is stable in high-dimensional models.
5. Entropy and Information-theoretic Roles
For families of positive random variables closed under transformations $\operatorname{Var}_\log(X) := \operatorname{Var}(\log X) = \mathbb{E}\left[(\log X - \mathbb{E}[\log X])^2\right]$7, the entropy $\operatorname{Var}_\log(X) := \operatorname{Var}(\log X) = \mathbb{E}\left[(\log X - \mathbb{E}[\log X])^2\right]$8 and the variance of self-information $\operatorname{Var}_\log(X) := \operatorname{Var}(\log X) = \mathbb{E}\left[(\log X - \mathbb{E}[\log X])^2\right]$9 admit closed forms in terms of the mean and variance of 0: 1
2
where 3 and 4 are constants specific to the distribution family but independent of the scaling or power parameters (0705.4045). This framework applies universally to generalized gamma, log-normal, exponential, and gamma distributions.
For example, for the log-normal distribution,
5
where 6 and 7 are the mean and variance of 8.
6. Asymptotic Laws and Limit Behavior
When independent and identically distributed 9 are considered, the empirical geometric mean,
0
converges almost surely to the population geometric mean 1. The central limit theorem in logarithmic distance asserts that
2
and the scaled geometric mean converges in distribution to a log-normal, just as the arithmetic mean under Euclidean error converges to a Gaussian (Gzyl, 2017).
Log-variance remains the natural measure of spread in this context, replacing the classical variance for multiplicative data. Empirical estimators for 3 and 4 are unbiased and consistent in the logarithmic metric.
7. Practical Computation and Empirical Use
Practical procedures for handling high log-variance data or log-probability estimates are summarized as follows:
- Always log-transform positive, multiplicatively dispersed data.
- Estimate sample log-mean and log-variance.
- Employ Bayesian bootstrap with Dirichlet weights in log space for uncertainty quantification (Mostofian et al., 2018).
- Use log-variance-based loss functions in variational inference for low-variance gradient estimates, avoiding high-variance score-function methods (Richter et al., 2020).
- In entropy/information-theoretic settings, compute sample means and variances of 5 for parameter-free, closed-form estimates of entropy and self-information variance (0705.4045).
Taken together, log-probability variance is a central analytic quantity for describing, estimating, and optimizing in settings dominated by multiplicative randomness, non-Gaussian tails, or information-theoretic criteria. Its estimation and interpretation are critical wherever the arithmetic mean and variance fail to provide robust, physically meaningful, or unbiased results.