Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
194 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Geometric Jensen-Shannon Divergence

Updated 30 June 2025
  • Geometric Jensen-Shannon Divergence (GJS) is a generalization of classical JSD that uses a geometric mean to interpolate between probability distributions.
  • It provides closed-form expressions for distributions like Gaussians and exponential families, enhancing computational efficiency in machine learning.
  • GJS exhibits strong metric and geometric properties, making it a robust tool for statistical inference, regularization, and clustering applications.

The Geometric Jensen-Shannon divergence (GJS) is a parametric generalization of the classical Jensen-Shannon divergence (JSD) that uses the geometric mean, rather than the arithmetic mean, to interpolate between two probability distributions or densities. This geometric construction provides several key advantages, including closed-form expressions for broad distribution families (notably Gaussians), natural connections to information geometry, metric and regularization properties, and computationally tractable implementations for machine learning applications.

1. Foundational Definition and Construction

The classical Jensen-Shannon divergence for probability distributions PP and QQ is defined as

JSD(PQ)=12KL(PP+Q2)+12KL(QP+Q2),\operatorname{JSD}(P\|Q) = \frac{1}{2} KL\left(P \Big\| \frac{P+Q}{2}\right) + \frac{1}{2} KL\left(Q \Big\| \frac{P+Q}{2}\right),

where KL()KL(\cdot\|\cdot) denotes the Kullback-Leibler divergence and the mean is an arithmetic mixture.

The Geometric Jensen-Shannon divergence generalizes this construction by replacing the arithmetic mixture with the geometric mean: GJSα(PQ)=(1α)KL(PGα(P,Q))+αKL(QGα(P,Q)),\operatorname{GJS}_\alpha(P\|Q) = (1-\alpha) \, KL\left(P \middle\| G_\alpha(P,Q)\right) + \alpha\, KL\left(Q \middle\| G_\alpha(P,Q)\right), where for 0α10 \leq \alpha \leq 1, the geometric mean density is

Gα(P,Q)(x)=p(x)1αq(x)αZαG(P:Q),ZαG(P:Q)=p(x)1αq(x)αdx.G_\alpha(P, Q)(x) = \frac{p(x)^{1-\alpha} q(x)^\alpha}{Z_\alpha^G(P:Q)}, \qquad Z_\alpha^G(P:Q) = \int p(x)^{1-\alpha} q(x)^\alpha dx.

This general form encompasses both the symmetric case (α=1/2)(\alpha=1/2) and arbitrary skewings. The GJS reduces to specific known divergences at the endpoints α=0\alpha=0 and α=1\alpha=1.

2. Analytical Solutions for Exponential Families and Gaussians

A central utility of the GJS is its closed-form expression for parametric families where the arithmetic mixture does not yield a tractable formula. Specifically, in exponential families:

Let pθ(x)=exp(θ,t(x)F(θ))p_\theta(x) = \exp(\langle \theta, t(x) \rangle - F(\theta)) denote a canonical exponential family density. Then, for pθ0p_{\theta_0} and pθ1p_{\theta_1},

Gα(pθ0,pθ1)(x)=p(1α)θ0+αθ1(x),G_\alpha(p_{\theta_0}, p_{\theta_1})(x) = p_{(1-\alpha)\theta_0 + \alpha\theta_1}(x),

and the normalization constant ZαGZ_\alpha^G can be written using the log-partition function F(θ)F(\theta). The divergence becomes

JSαG(pθ0pθ1)=(1α)KL(pθ0pθα)+αKL(pθ1pθα),JS^G_\alpha(p_{\theta_0}\|p_{\theta_1}) = (1-\alpha) KL(p_{\theta_0} \| p_{\theta_\alpha}) + \alpha KL(p_{\theta_1} \| p_{\theta_\alpha}),

where θα=(1α)θ0+αθ1\theta_\alpha = (1-\alpha)\theta_0 + \alpha\theta_1 (1904.04017).

For multivariate Gaussians N(μi,Ci)N(\mu_i, C_i), the geometric mean yields

Cα=[(1α)C01+αC11]1,μα=Cα((1α)C01μ0+αC11μ1),C_\alpha = [(1-\alpha)C_0^{-1} + \alpha C_1^{-1}]^{-1}, \qquad \mu_\alpha = C_\alpha\left((1-\alpha) C_0^{-1}\mu_0 + \alpha C_1^{-1}\mu_1\right),

and GJS admits a closed-form solution for the divergence (2006.10599, 2506.10494).

3. Metric and Geometric Properties

The GJS inherits and extends important geometric properties of the classical JSD:

  • Metric property: The square root of the Geometric Jensen-Shannon divergence, when defined over probability vectors or quantum density matrices, is a metric—i.e., it is symmetric, non-negative, satisfies the triangle inequality, and vanishes iff the arguments coincide (1105.2707, 1910.10447, 1911.02643).
  • Information geometry: For exponential families, GJS is aligned with the geometry defined by the log-partition function and the associated Bregman divergences. The corresponding mean parameterization is natural for mixture modeling and clustering (1904.04017).
  • Hierarchical extension: The GJS can be generalized to families of divergences parametrized by skew-parameter α\alpha, or further via vector-skewings and abstract means (e.g., harmonic mean for Cauchy, as in (1904.04017, 1912.00610)), yielding flexible, tunable geometries.

Table: Metric Properties for Jensen-Shannon-type Divergences

Divergence Domain Metric (sqrt)? Reference
Classical JSD Probability simplex Yes (1105.2707)
Geometric JSD (GJS) Exp. family / Gaussian Yes (square root) (1904.04017, 2506.10494)
Quantum JSD (von Neumann) Density matrices/HPD Yes (square root) (1910.10447, 1911.02643)
Tsallis/Generalized JSD Probability simplex/HPD Yes (parametric family) (1911.02643, 0804.1653)

4. Computational Techniques and Algorithmic Implementations

The geometric structure of GJS enables:

  • Efficient computation for Gaussians and exponential families: All terms are reduced to operations involving means, covariances, or natural parameters; log-determinants, traces, and Euclidean (or Hilbert-space) norms are utilized.
  • Regularization mechanisms: In infinite-dimensional (Hilbert space) settings, log-determinant divergence and regularization techniques ensure the divergence remains well-defined and finite even when Gaussian measures are not mutually absolutely continuous (2506.10494).
  • Dimensionality reduction and sketching: Embedding techniques leverage GJS for distance-preserving mapping of distributions into low-dimensional Euclidean spaces while maintaining the geometric or simplex structure (1503.05225).
  • Clustering and mode-seeking: GJS serves as an objective for kk-means-like clustering of distributions, and provides tractable centroid computation for parametric families (1904.04017, 1912.00610).

5. Applications in Machine Learning and Statistics

  • Variational regularization: In VAEs and Bayesian neural networks, GJS is used as a replacement for KL-divergence, providing numerical stability, symmetry, and improved generalization (especially under heavy-tailed or non-Gaussian posteriors), and enabling explicit interpolation between forward and reverse KL regimes via the skew parameter (2006.10599, 2209.11366).
  • Two-sample testing and GANs: Representation-based generalizations of JSD (such as the RJSD) leverage uncentered covariance operators in RKHS to circumvent density estimation, providing robust two-sample tests and effective divergence objectives for generative modeling (2305.16446).
  • Geometry-driven structural analysis: In the quantification of symmetry breaking in physical systems, GJS is used as a geometric measure to compare atomic densities before and after symmetry operations, yielding a sensitive, continuous measure of geometric deviation (2410.21880).
  • Quantum information: The quantum GJS (via density matrices) provides a metric on quantum state space, extending the geometric aspects of the classical case to operator-algebraic settings (1910.10447, 1911.02643).

6. Extension to Infinite Dimensions and Regularization

For Gaussian measures on infinite-dimensional Hilbert space, the GJS requires careful regularization to deal with the absence of Lebesgue measure and the potential non-existence of (classical) KL divergence. The regularized GJS employs log-determinant divergences of trace-class operators and Fredholm determinants: GJSαγ(μ0μ1)=quadratic mean terms+(1α)dlogdet1(C0+γI,Cα,γ)+αdlogdet1(C1+γI,Cα,γ)\operatorname{GJS}_\alpha^\gamma(\mu_0 \| \mu_1) = \text{quadratic mean terms} + (1-\alpha) d^1_{\log\det}(C_0 + \gamma I, C_{\alpha,\gamma}) + \alpha d^1_{\log\det}(C_1 + \gamma I, C_{\alpha,\gamma}) as described precisely in (2506.10494). As the regularization parameter γ0\gamma \to 0, the expression recovers the exact divergence for equivalent measures.

7. Connections, Generalizations, and Theoretical Significance

The GJS extends classical divergence families, connecting convexity-based generalizations (e.g., qq-convexity for Tsallis entropies (0804.1653)), symmetrized Bregman divergences, and information geometric perspectives (canonical and potential divergences in dually flat spaces (1808.06482)). It underpins dense hierarchies of inequalities among divergences (1111.6372), and admits flexible parameterization (including monoparametric and vector-skew families (1709.10153, 1912.00610)). Its metric character, geometric mean closure, and strong geometric interpretability make it foundational for both theoretical exploration and real-world practice in statistical inference, machine learning, signal processing, and quantum information.


References to main formulas and sections:

Definition Search Book Streamline Icon: https://streamlinehq.com
References (15)