Geometric Jensen-Shannon Divergence

Updated 30 June 2025

Geometric Jensen-Shannon Divergence (GJS) is a generalization of classical JSD that uses a geometric mean to interpolate between probability distributions.
It provides closed-form expressions for distributions like Gaussians and exponential families, enhancing computational efficiency in machine learning.
GJS exhibits strong metric and geometric properties, making it a robust tool for statistical inference, regularization, and clustering applications.

The Geometric Jensen-Shannon divergence (GJS) is a parametric generalization of the classical Jensen-Shannon divergence (JSD) that uses the geometric mean, rather than the arithmetic mean, to interpolate between two probability distributions or densities. This geometric construction provides several key advantages, including closed-form expressions for broad distribution families (notably Gaussians), natural connections to information geometry, metric and regularization properties, and computationally tractable implementations for machine learning applications.

1. Foundational Definition and Construction

The classical Jensen-Shannon divergence for probability distributions $P$ and $Q$ is defined as

$\operatorname{JSD}(P\|Q) = \frac{1}{2} KL\left(P \Big\| \frac{P+Q}{2}\right) + \frac{1}{2} KL\left(Q \Big\| \frac{P+Q}{2}\right),$

where $KL(\cdot\|\cdot)$ denotes the Kullback-Leibler divergence and the mean is an arithmetic mixture.

The Geometric Jensen-Shannon divergence generalizes this construction by replacing the arithmetic mixture with the geometric mean: $\operatorname{GJS}_\alpha(P\|Q) = (1-\alpha) \, KL\left(P \middle\| G_\alpha(P,Q)\right) + \alpha\, KL\left(Q \middle\| G_\alpha(P,Q)\right),$ where for $0 \leq \alpha \leq 1$ , the geometric mean density is

$G_\alpha(P, Q)(x) = \frac{p(x)^{1-\alpha} q(x)^\alpha}{Z_\alpha^G(P:Q)}, \qquad Z_\alpha^G(P:Q) = \int p(x)^{1-\alpha} q(x)^\alpha dx.$

This general form encompasses both the symmetric case $(\alpha=1/2)$ and arbitrary skewings. The GJS reduces to specific known divergences at the endpoints $\alpha=0$ and $\alpha=1$ .

2. Analytical Solutions for Exponential Families and Gaussians

A central utility of the GJS is its closed-form expression for parametric families where the arithmetic mixture does not yield a tractable formula. Specifically, in exponential families:

Let $p_\theta(x) = \exp(\langle \theta, t(x) \rangle - F(\theta))$ denote a canonical exponential family density. Then, for $p_{\theta_0}$ and $p_{\theta_1}$ ,

$G_\alpha(p_{\theta_0}, p_{\theta_1})(x) = p_{(1-\alpha)\theta_0 + \alpha\theta_1}(x),$

and the normalization constant $Z_\alpha^G$ can be written using the log-partition function $F(\theta)$ . The divergence becomes

$JS^G_\alpha(p_{\theta_0}\|p_{\theta_1}) = (1-\alpha) KL(p_{\theta_0} \| p_{\theta_\alpha}) + \alpha KL(p_{\theta_1} \| p_{\theta_\alpha}),$

where $\theta_\alpha = (1-\alpha)\theta_0 + \alpha\theta_1$ (Nielsen, 2019).

For multivariate Gaussians $N(\mu_i, C_i)$ , the geometric mean yields

$C_\alpha = [(1-\alpha)C_0^{-1} + \alpha C_1^{-1}]^{-1}, \qquad \mu_\alpha = C_\alpha\left((1-\alpha) C_0^{-1}\mu_0 + \alpha C_1^{-1}\mu_1\right),$

and GJS admits a closed-form solution for the divergence (Deasy et al., 2020, Quang et al., 12 Jun 2025).

3. Metric and Geometric Properties

The GJS inherits and extends important geometric properties of the classical JSD:

Metric property: The square root of the Geometric Jensen-Shannon divergence, when defined over probability vectors or quantum density matrices, is a metric—i.e., it is symmetric, non-negative, satisfies the triangle inequality, and vanishes iff the arguments coincide (Costa et al., 2011, Virosztek, 2019, Sra, 2019).
Information geometry: For exponential families, GJS is aligned with the geometry defined by the log-partition function and the associated Bregman divergences. The corresponding mean parameterization is natural for mixture modeling and clustering (Nielsen, 2019).
Hierarchical extension: The GJS can be generalized to families of divergences parametrized by skew-parameter $\alpha$ , or further via vector-skewings and abstract means (e.g., harmonic mean for Cauchy, as in (Nielsen, 2019, Nielsen, 2019)), yielding flexible, tunable geometries.

Table: Metric Properties for Jensen-Shannon-type Divergences

Divergence	Domain	Metric (sqrt)?	Reference
Classical JSD	Probability simplex	Yes	(Costa et al., 2011)
Geometric JSD (GJS)	Exp. family / Gaussian	Yes (square root)	(Nielsen, 2019, Quang et al., 12 Jun 2025)
Quantum JSD (von Neumann)	Density matrices/HPD	Yes (square root)	(Virosztek, 2019, Sra, 2019)
Tsallis/Generalized JSD	Probability simplex/HPD	Yes (parametric family)	(Sra, 2019, 0804.1653)

4. Computational Techniques and Algorithmic Implementations

The geometric structure of GJS enables:

Efficient computation for Gaussians and exponential families: All terms are reduced to operations involving means, covariances, or natural parameters; log-determinants, traces, and Euclidean (or Hilbert-space) norms are utilized.
Regularization mechanisms: In infinite-dimensional (Hilbert space) settings, log-determinant divergence and regularization techniques ensure the divergence remains well-defined and finite even when Gaussian measures are not mutually absolutely continuous (Quang et al., 12 Jun 2025).
Dimensionality reduction and sketching: Embedding techniques leverage GJS for distance-preserving mapping of distributions into low-dimensional Euclidean spaces while maintaining the geometric or simplex structure (Abdullah et al., 2015).
Clustering and mode-seeking: GJS serves as an objective for $k$ -means-like clustering of distributions, and provides tractable centroid computation for parametric families (Nielsen, 2019, Nielsen, 2019).

5. Applications in Machine Learning and Statistics

Variational regularization: In VAEs and Bayesian neural networks, GJS is used as a replacement for KL-divergence, providing numerical stability, symmetry, and improved generalization (especially under heavy-tailed or non-Gaussian posteriors), and enabling explicit interpolation between forward and reverse KL regimes via the skew parameter (Deasy et al., 2020, Thiagarajan et al., 2022).
Two-sample testing and GANs: Representation-based generalizations of JSD (such as the RJSD) leverage uncentered covariance operators in RKHS to circumvent density estimation, providing robust two-sample tests and effective divergence objectives for generative modeling (Hoyos-Osorio et al., 2023).
Geometry-driven structural analysis: In the quantification of symmetry breaking in physical systems, GJS is used as a geometric measure to compare atomic densities before and after symmetry operations, yielding a sensitive, continuous measure of geometric deviation (Lan et al., 29 Oct 2024).
Quantum information: The quantum GJS (via density matrices) provides a metric on quantum state space, extending the geometric aspects of the classical case to operator-algebraic settings (Virosztek, 2019, Sra, 2019).

6. Extension to Infinite Dimensions and Regularization

For Gaussian measures on infinite-dimensional Hilbert space, the GJS requires careful regularization to deal with the absence of Lebesgue measure and the potential non-existence of (classical) KL divergence. The regularized GJS employs log-determinant divergences of trace-class operators and Fredholm determinants: $\operatorname{GJS}_\alpha^\gamma(\mu_0 \| \mu_1) = \text{quadratic mean terms} + (1-\alpha) d^1_{\log\det}(C_0 + \gamma I, C_{\alpha,\gamma}) + \alpha d^1_{\log\det}(C_1 + \gamma I, C_{\alpha,\gamma})$ as described precisely in (Quang et al., 12 Jun 2025). As the regularization parameter $\gamma \to 0$ , the expression recovers the exact divergence for equivalent measures.

7. Connections, Generalizations, and Theoretical Significance

The GJS extends classical divergence families, connecting convexity-based generalizations (e.g., $q$ -convexity for Tsallis entropies (0804.1653)), symmetrized Bregman divergences, and information geometric perspectives (canonical and potential divergences in dually flat spaces (Nishiyama, 2018)). It underpins dense hierarchies of inequalities among divergences (Taneja, 2011), and admits flexible parameterization (including monoparametric and vector-skew families (Osán et al., 2017, Nielsen, 2019)). Its metric character, geometric mean closure, and strong geometric interpretability make it foundational for both theoretical exploration and real-world practice in statistical inference, machine learning, signal processing, and quantum information.

References to main formulas and sections: