Papers
Topics
Authors
Recent
2000 character limit reached

Information Geometry

Updated 30 December 2025
  • Information Geometry is a field that uses differential geometry to describe statistical manifolds via the Fisher information metric and divergence functions.
  • It employs dual affine connections and α-connections to characterize estimation efficiency and optimization in both parametric and nonparametric models.
  • Its practical applications include natural gradient descent, Cramér-Rao efficiency in estimation, and quantum Fisher metrics for advanced statistical inference.

Information geometry generalizes the differential geometric treatment of spaces of probability distributions by equipping these spaces—statistical manifolds—with a Riemannian metric and dual affine connections canonically derived from statistical inference principles. The foundational metric is the Fisher information metric, which encodes statistical distinguishability and underpins the geometry of parametric and nonparametric models, estimation efficiency, optimization, and inference. The dualistic structure, via α-connections (notably the exponential and mixture connections), admits global dually flat coordinates for exponential families and more generally extends to quantum systems, stochastic dynamical models, compositional data, optimization algorithms, and connections with optimal transport.

1. Statistical Manifolds, Fisher Metric, and Dual Connections

A statistical manifold M\mathcal{M} is a smooth parameterized family {p(x;θ)}\{p(x;\theta)\}, with the parameter domain ΘRd\Theta\subset\mathbb{R}^d serving as the manifold coordinates. The Fisher information metric gg on the tangent space TθMT_\theta\mathcal{M} is defined by

gij(θ)=Ep(;θ)[ilogp(X;θ)jlogp(X;θ)]g_{ij}(\theta) = \mathbb{E}_{p(\cdot;\theta)} \left[ \partial_{i}\log p(X;\theta) \partial_{j}\log p(X;\theta) \right]

where i=/θi\partial_{i}=\partial/\partial\theta^i, and expectation is under p(;θ)p(\cdot;\theta) (Caticha, 2014, Pistone, 4 Feb 2025, Mishra et al., 2023, Ay et al., 2012).

Dual affine connections (α)\nabla^{(\alpha)} parameterized by αR\alpha\in\mathbb{R} are defined via Christoffel symbols: Γij(α)k=Eθ[ijlogp klogp]+1α2Eθ[ilogp jlogp klogp]\Gamma^{(\alpha)\,k}_{ij} = \mathbb{E}_\theta \left[ \partial_{i}\partial_{j} \log p \ \partial^{k} \log p \right] + \frac{1-\alpha}{2} \mathbb{E}_\theta \left[ \partial_{i}\log p \ \partial_{j}\log p \ \partial^{k}\log p \right] with (1)\nabla^{(1)} (exponential connection) and (1)\nabla^{(-1)} (mixture connection) being dual with respect to gg (Pistone, 4 Feb 2025, Nielsen, 2018, Mishra et al., 2023, Itoh et al., 2022).

Exponential families p(x;θ)=h(x)exp{θiTi(x)ψ(θ)}p(x;\theta)=h(x)\exp\{\theta^iT_i(x)-\psi(\theta)\} imbue the manifold with dually flat geometry: exponential ((1)\nabla^{(1)}) coordinates θ\theta, and mixture ((1)\nabla^{(-1)}) coordinates η=Eθ[T(X)]\eta=\mathbb{E}_\theta[T(X)], related by Legendre duality (Mishra et al., 2023, Caticha, 2014).

The unique status of the Fisher metric is established by Chentsov’s theorem: it is invariant under sufficient statistics and Markov morphisms on finite and infinite sample spaces, up to scale, and thus is the canonical metric for statistical models (Ay et al., 2012, Erb et al., 2020).

2. Divergences, Bregman Geometry, Projections, and Geodesics

Divergence functions D(PQ)D(P\|Q) generalize distances on manifold M\mathcal{M}, yielding a metric and dual connections via derivatives: gij(θ)=θiθjD(θθ)θ=θg_{ij}(\theta)= \left. \partial_{\theta_i}\partial_{\theta_j'} D(\theta\|\theta') \right|_{\theta'=\theta}

Γijk=θiθjθkD(θθ)θ=θ\Gamma_{ijk} = -\left. \partial_{\theta_i}\partial_{\theta_j}\partial_{\theta_k'} D(\theta\|\theta') \right|_{\theta'=\theta}

(Mishra et al., 2023, Nielsen, 2018, Wong et al., 2019).

Bregman divergences for strictly convex ψ:ΘR\psi:\Theta\to\mathbb{R}

Dψ(xy)=ψ(x)ψ(y)ψ(y),xyD_\psi(x\|y) = \psi(x) - \psi(y) - \langle\nabla\psi(y), x-y\rangle

induce the Riemannian metric g(x)=2ψ(x)g(x)=\nabla^2\psi(x) and dual coordinates via Legendre transform ϕ\phi (Raskutti et al., 2013, Nielsen, 2018, Mishra et al., 2023, Erb et al., 2020).

KL-divergence DKL(pq)=p(x)log(p(x)/q(x))dxD_{\mathrm{KL}}(p\|q)=\int p(x)\log(p(x)/q(x))dx is central, with the second derivative yielding the Fisher metric (Caticha, 2014, Mishra et al., 2023, Erb et al., 2020, Ay et al., 2012).

The generalized Pythagorean theorem for Bregman/KL divergence: Dψ(xx)+Dψ(xx)=Dψ(xx)D_\psi(x''\|x') + D_\psi(x'\|x) = D_\psi(x''\|x) holds when primal/dual geodesics are orthogonal with respect to gg—a property with extensive implications for information projection, statistical inference, and variational optimization (Caticha, 2014, Nielsen, 2018, Amari et al., 2017, Oizumi et al., 2015).

Geodesics under dual connections are straight lines in corresponding coordinates: θt=(1t)θ0+tθ1\theta_t=(1-t)\theta_0+t\theta_1 (exponential), ηt\eta_t (mixture) (Mishra et al., 2023, Nielsen, 2018, Caticha, 2014). On dually flat manifolds, geodesics allow closed-form solutions. For Riemannian (Levi-Civita) geodesics, solution is curve interpolating between exponential and mixture geodesics (Itoh et al., 2022).

3. Sufficient Statistics, Invariance, and Canonicality

A sufficient statistic κ:ΩΩ\kappa:\Omega\to\Omega' for (M,Ω,μ0,p())(M, \Omega, \mu_0, p(\cdot)) ensures that the Fisher metric and Amari-Chentsov tensor are preserved under pushforward models: gFp=gFκpg_F^p = g_F^{\kappa_*p}

TACp=TACκpT_{AC}^p = T_{AC}^{\kappa_*p}

(Ay et al., 2012). For any non-sufficient statistic, monotonicity holds: gFκp(V,V)gFp(V,V)g_F^{\kappa_*p}(V,V) \leq g_F^p(V,V) Monotonicity follows from invariance under sufficient statistics, extending Chentsov's finite-sample uniqueness result to infinite dimensions (Ay et al., 2012).

The only Riemannian metric on the space of discrete probability distributions (simplex) that is invariant under sufficient statistics (Markov morphisms, coarse graining, reparametrization) is the Fisher information metric (Erb et al., 2020, Ay et al., 2012).

4. Applications: Estimation, Optimization, Inference, Quantum Generalizations

Efficient Estimation and Cramér-Rao Bound

The inverse Fisher metric yields the Cramér-Rao lower bound for unbiased estimators (asymptotically achieved by MLE) (Mishra et al., 2023, Caticha, 2014).

Optimization Algorithms

Mirror descent—using Bregman divergences—is equivalent to natural gradient descent on the dual Riemannian manifold: μt+1=μtηf(xt)\mu_{t+1} = \mu_t - \eta \nabla f(x_t) is steepest descent under the dual metric G(μ)=2ϕ(μ)G(\mu)=\nabla^2\phi(\mu) (Raskutti et al., 2013).

The natural gradient update: θt+1=θtη[I(θt)]1f(θt)\theta_{t+1} = \theta_t - \eta [I(\theta_t)]^{-1} \nabla f(\theta_t) parallels mirror descent; for exponential families, mirror descent with Bregman divergence is precisely the first-order implementation of natural gradient descent. These methods achieve Fisher asymptotic efficiency (Cramér-Rao bound) (Malagò et al., 2014, Raskutti et al., 2013, Mishra et al., 2023).

Quantum Information Geometry

Quantum Fisher information yields monotone Riemannian metrics (Petz classification). The Bures metric for mixed states is the quantum analogue of the 2-Wasserstein distance, and quantum Cramér-Rao bounds govern precision in estimation (Bohra et al., 2021, Mishra et al., 2023). Information geometry also reconstructs quantum theory from Fisher geometry plus physical postulates (complementarity, measurement simulability, gauge invariance), leading to complex Hilbert space and unitary dynamics (0805.2770).

Other Domains

Applications include compositional data analysis (CoDA), evolutionary game theory (Shahshahani geometry), information integration quantification, statistical physics, radar signal processing, machine learning, and analysis of compositional mixtures (Erb et al., 2020, 0911.1383, Oizumi et al., 2015, Amari et al., 2017, Malagò et al., 2014, Caticha, 2015).

5. Geometry of Probability Measures and Infinite-dimensional Extensions

On a compact manifold MM, the space of smooth probability densities P+(M)P^+(M) admits a Fisher-Rao metric: Gμ(τ1,τ2)=Mh1(x)h2(x)f(x)dλ(x)G_\mu(\tau_1,\tau_2) = \int_M \frac{h_1(x) h_2(x)}{f(x)} d\lambda(x) with tangent vectors τ=hλ\tau=h\lambda (Itoh et al., 2022). The Fisher metric is the Hessian of the KL-divergence, and the Levi-Civita connection yields Riemannian geodesics expressible as closed-form curves via the square-root map embedding into L2L^2 spheres.

Dualistic α-connections extend to infinite dimensions; the geometry encodes statistical distinguishability, invariance under diffeomorphisms, and characterizes barycenter maps on the ideal boundary of Hadamard manifolds (Itoh et al., 2022).

Nonparametric information geometry generalizes finite-dimensional concepts using statistical bundles—pairs (density, score)—parallel transport, and exponential charts. KL-divergence is generated via affine-geometric transformations and the second-order Taylor expansion recovers the Fisher metric as the Hessian (Pistone, 4 Feb 2025).

6. Interactions, Information Integration, and Hierarchies

Information geometry provides a unified language for interaction measures via KL-projection onto constraint manifolds. Integrated information, transfer entropy, and stochastic interaction quantify "broken" causal links as divergence from full joint to split-model submanifolds: ΦC(P):=minqMCD(Pq)\Phi_C(P) := \min_{q\in\mathcal{M}_C} D(P\|q) for constraints CC encoding independence/conditional independence relations. Hierarchies of measures arise via inclusion of submanifolds, with properties deduced from geometry (e.g., nonnegativity, bounds by mutual information) (Oizumi et al., 2015, Amari et al., 2017).

In compositional data analysis, the Fisher metric is uniquely suitable for measuring statistical distance, respecting invariance under coarse graining, and is foundational for maximum-entropy and projection theorems. It contrasts with Euclidean (Aitchison) metric, and monotonicity of statistical divergence under amalgamation is rigorously established (Erb et al., 2020).

7. Curvature, Criticality, and Connections with Optimal Transport

The scalar curvature of the information metric (Ruppeiner metric, quantum fidelity susceptibility) encodes correlation volume (criticality) in statistical and quantum systems: RξDR \sim \xi^D Scaling relations among curvature, geodesic distance, and correlation length are prescribed by RG beta-functions and parameter manifold geometry (Maity et al., 2015). Notably, in quantum statistical models, scalar curvature need not diverge at phase transitions, as demonstrated in the geometry of Bose-Einstein condensation (Pessoa et al., 2021).

Optimal transport and information geometry intersect via c-divergence and the pseudo-Riemannian metric, with dual connections and curvature tensors arising from the Levi-Civita connection of the ambient pseudo-metric. Bregman and L(α)L^{(\alpha)}-divergences correspond to dually flat and constant-curvature structures, respectively, with geodesic-induced curvature expressing the Ma-Trudinger-Wang (MTW) regularity condition (Wong et al., 2019).


Table: Key Information-Geometric Objects

Object Definition/Formulation Domain/Significance
Fisher metric gij=Eθ[ilogpjlogp]g_{ij}=\mathbb{E}_\theta[\partial_i \log p \partial_j \log p] Statistical distinguishability
α-connections Γ(α)\Gamma^{(\alpha)} via score and cubic tensor Dualistic structure
KL/Bregman divergence Dψ(xy)=ψ(x)ψ(y)ψ(y),xyD_\psi(x\|y)=\psi(x)-\psi(y)-\langle\nabla\psi(y), x-y\rangle Statistical "distance"/projection
Exponential/mixture coordinates θ\theta/η\eta linked via Legendre duality Dually flat geometry
Sufficient statistics Compression preserving Fisher metric and cubic tensor Canonicality, monotonicity
Mirror/natural gradient First-order update via dual coordinates or Hessian inversion Efficient optimization

Information geometry is thus a canonical, rigorous framework for understanding statistical inference, learning, dynamical inference, and physical models from the perspective of differential geometry and metric theory, with deep unifying links to quantum mechanics, data analysis, optimization, and geometry of probability spaces.

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to Information Geometry.