Geometric Jensen-Shannon Divergence
- Geometric Jensen-Shannon Divergence (GJS) is a generalization of classical JSD that uses a geometric mean to interpolate between probability distributions.
- It provides closed-form expressions for distributions like Gaussians and exponential families, enhancing computational efficiency in machine learning.
- GJS exhibits strong metric and geometric properties, making it a robust tool for statistical inference, regularization, and clustering applications.
The Geometric Jensen-Shannon divergence (GJS) is a parametric generalization of the classical Jensen-Shannon divergence (JSD) that uses the geometric mean, rather than the arithmetic mean, to interpolate between two probability distributions or densities. This geometric construction provides several key advantages, including closed-form expressions for broad distribution families (notably Gaussians), natural connections to information geometry, metric and regularization properties, and computationally tractable implementations for machine learning applications.
1. Foundational Definition and Construction
The classical Jensen-Shannon divergence for probability distributions and is defined as
where denotes the Kullback-Leibler divergence and the mean is an arithmetic mixture.
The Geometric Jensen-Shannon divergence generalizes this construction by replacing the arithmetic mixture with the geometric mean: where for , the geometric mean density is
This general form encompasses both the symmetric case and arbitrary skewings. The GJS reduces to specific known divergences at the endpoints and .
2. Analytical Solutions for Exponential Families and Gaussians
A central utility of the GJS is its closed-form expression for parametric families where the arithmetic mixture does not yield a tractable formula. Specifically, in exponential families:
Let denote a canonical exponential family density. Then, for and ,
and the normalization constant can be written using the log-partition function . The divergence becomes
where (1904.04017).
For multivariate Gaussians , the geometric mean yields
and GJS admits a closed-form solution for the divergence (2006.10599, 2506.10494).
3. Metric and Geometric Properties
The GJS inherits and extends important geometric properties of the classical JSD:
- Metric property: The square root of the Geometric Jensen-Shannon divergence, when defined over probability vectors or quantum density matrices, is a metric—i.e., it is symmetric, non-negative, satisfies the triangle inequality, and vanishes iff the arguments coincide (1105.2707, 1910.10447, 1911.02643).
- Information geometry: For exponential families, GJS is aligned with the geometry defined by the log-partition function and the associated Bregman divergences. The corresponding mean parameterization is natural for mixture modeling and clustering (1904.04017).
- Hierarchical extension: The GJS can be generalized to families of divergences parametrized by skew-parameter , or further via vector-skewings and abstract means (e.g., harmonic mean for Cauchy, as in (1904.04017, 1912.00610)), yielding flexible, tunable geometries.
Table: Metric Properties for Jensen-Shannon-type Divergences
Divergence | Domain | Metric (sqrt)? | Reference |
---|---|---|---|
Classical JSD | Probability simplex | Yes | (1105.2707) |
Geometric JSD (GJS) | Exp. family / Gaussian | Yes (square root) | (1904.04017, 2506.10494) |
Quantum JSD (von Neumann) | Density matrices/HPD | Yes (square root) | (1910.10447, 1911.02643) |
Tsallis/Generalized JSD | Probability simplex/HPD | Yes (parametric family) | (1911.02643, 0804.1653) |
4. Computational Techniques and Algorithmic Implementations
The geometric structure of GJS enables:
- Efficient computation for Gaussians and exponential families: All terms are reduced to operations involving means, covariances, or natural parameters; log-determinants, traces, and Euclidean (or Hilbert-space) norms are utilized.
- Regularization mechanisms: In infinite-dimensional (Hilbert space) settings, log-determinant divergence and regularization techniques ensure the divergence remains well-defined and finite even when Gaussian measures are not mutually absolutely continuous (2506.10494).
- Dimensionality reduction and sketching: Embedding techniques leverage GJS for distance-preserving mapping of distributions into low-dimensional Euclidean spaces while maintaining the geometric or simplex structure (1503.05225).
- Clustering and mode-seeking: GJS serves as an objective for -means-like clustering of distributions, and provides tractable centroid computation for parametric families (1904.04017, 1912.00610).
5. Applications in Machine Learning and Statistics
- Variational regularization: In VAEs and Bayesian neural networks, GJS is used as a replacement for KL-divergence, providing numerical stability, symmetry, and improved generalization (especially under heavy-tailed or non-Gaussian posteriors), and enabling explicit interpolation between forward and reverse KL regimes via the skew parameter (2006.10599, 2209.11366).
- Two-sample testing and GANs: Representation-based generalizations of JSD (such as the RJSD) leverage uncentered covariance operators in RKHS to circumvent density estimation, providing robust two-sample tests and effective divergence objectives for generative modeling (2305.16446).
- Geometry-driven structural analysis: In the quantification of symmetry breaking in physical systems, GJS is used as a geometric measure to compare atomic densities before and after symmetry operations, yielding a sensitive, continuous measure of geometric deviation (2410.21880).
- Quantum information: The quantum GJS (via density matrices) provides a metric on quantum state space, extending the geometric aspects of the classical case to operator-algebraic settings (1910.10447, 1911.02643).
6. Extension to Infinite Dimensions and Regularization
For Gaussian measures on infinite-dimensional Hilbert space, the GJS requires careful regularization to deal with the absence of Lebesgue measure and the potential non-existence of (classical) KL divergence. The regularized GJS employs log-determinant divergences of trace-class operators and Fredholm determinants: as described precisely in (2506.10494). As the regularization parameter , the expression recovers the exact divergence for equivalent measures.
7. Connections, Generalizations, and Theoretical Significance
The GJS extends classical divergence families, connecting convexity-based generalizations (e.g., -convexity for Tsallis entropies (0804.1653)), symmetrized Bregman divergences, and information geometric perspectives (canonical and potential divergences in dually flat spaces (1808.06482)). It underpins dense hierarchies of inequalities among divergences (1111.6372), and admits flexible parameterization (including monoparametric and vector-skew families (1709.10153, 1912.00610)). Its metric character, geometric mean closure, and strong geometric interpretability make it foundational for both theoretical exploration and real-world practice in statistical inference, machine learning, signal processing, and quantum information.
References to main formulas and sections:
- Explicit definitions and derivations: (1904.04017, 2006.10599, 2506.10494)
- Metric property proofs: (1105.2707, 1910.10447, 1911.02643, 1709.10153)
- Information geometry perspective: (1808.06482)
- Machine learning and statistical applications: (2006.10599, 2209.11366, 2305.16446, 2410.21880)
- Infinite-dimensional regularization: (2506.10494)
- Generalizations and monotonic hierarchies: (1111.6372, 0804.1653)