Extended Information Geometry Overview

Updated 20 March 2026

Extended Information Geometry is a framework that extends classical statistical divergence methods to include quantum, conditional, and compositional models using novel divergence functions.
It employs generalized divergence functions, conformal and pseudo-Riemannian metrics, and dual affine connections to analyze systems beyond traditional probability distributions.
The unified geometric toolkit aids in model selection, parameter estimation, and uncertainty quantification across applications from quantum physics to machine learning.

Extended information geometry generalizes the differential-geometric structures of classical information geometry, originally formulated for statistical manifolds of probability distributions, to a broader class of objects including non-statistical systems, quantum models, conditional and compositional data, and geometries governed by non-Fisher divergences and connections. This extension encompasses new types of divergence functions, conformal and pseudo-Riemannian metrics, and dualistic structures beyond the classical dually flat paradigm. The field synthesizes approaches from information theory, machine learning, quantum estimation, optimal transport, large deviation theory, and computational methods, yielding a unified geometric toolkit adaptable to both statistical and non-statistical modeling frameworks.

1. Generalized Divergence Functions and Metric Structures

Extended information geometry begins by detaching from the strict probabilistic interpretation of divergence. A divergence function $D(x \| q)$ is assumed, defined on a data space $\mathcal{X}$ and a manifold of models $M$ , satisfying non-negativity, smoothness in model parameters, and orthogonal projection properties. The local metric is induced as the Hessian of the divergence at the projection point: $g_{ij}(\theta) = \left. \frac{\partial^2}{\partial\theta^i \partial\theta^j} D(x \| q(\theta)) \right|_{x \text{ projects onto } q(\theta)}$ This framework recovers the classical Fisher information metric when $D$ is the Kullback–Leibler divergence, but the same structure applies for generalized divergences (e.g., Rényi, quantum, Bregman, optimal transport) and for data or model manifolds not representable as probability densities (Naudts et al., 2015). The definition of the metric makes minimal requirements on the nature of $x$ and $q$ , allowing, for example, density matrices or occupation number sequences as “data.”

2. Dual Affine Connections and Divergence-Induced Geometry

Affine connections generalize the concept of geodesics and "straightness" in the model manifold. For a divergence $D$ , cubic tensors of the first kind (Christoffel symbols) are derived as higher derivatives: $\Gamma_{ijk} = -\frac{\partial^3}{\partial\theta^i\partial\theta^j\partial\theta^{\prime k}} D(\theta : \theta') \Big|_{\theta' = \theta}$ Dual connections $\nabla$ and $\nabla^*$ appear naturally, with dual-flatness (e.g., in exponential families) corresponding to vanishing curvature in suitable affine coordinates. Generalized information geometry admits non-flat dualistic structures: for example, the $c$ -divergence geometry of optimal transport features nontrivial curvature (related to the Ma–Trudinger–Wang tensor), and the Rényi-divergence-induced geometry introduces a conformal factor in the metric and modifies the interpolation between mixture and exponential connections (Wong et al., 2019, Kuntz et al., 22 May 2025). These structures are essential in contexts such as quantum state manifolds and operator scaling, where non-commutative analogs of e- and m-connections must be introduced (Matsuda et al., 2020).

3. Extension Beyond Statistical Models: Quantum, Conditional, and Non-Statistical Systems

Information geometry extends to:

Quantum information geometry, where the data space comprises density matrices and divergences include the Umegaki (quantum relative entropy) and more general monotone divergences. Here, the Fisher-type metric is generalized to quantum information metrics (e.g., the symmetric logarithmic derivative metric), and the e-/m-geodesics and dual connections acquire non-commutative formulations (Naudts et al., 2015, Matsuda et al., 2020).
Conditional models, where the product Fisher information metric is uniquely selected (up to scale) as the invariant Riemannian metric under congruent embeddings by Markov morphisms, extending Čencov/Campbell theorems to the manifold of positive conditional models (Lebanon, 2012).
Compositional data analysis (CoDA), where the simplex is treated as a statistical manifold, and the Fisher metric (and associated divergences, such as α-divergences and Aitchison distance) are justified as unique invariant structures for non-Euclidean geometry of compositional vectors (Erb et al., 2020).
Non-statistical systems, including the geometry of the Bose gas in the grand canonical ensemble and quantum measurement via conditional expectations, where the data does not correspond to probability distributions but the divergence-induced metric and projection properties still apply (Naudts et al., 2015).

4. New Families of Information-Geometric Structures

Extensions include:

Rényi-induced geometry, where the Rényi divergence $D_\rho$ generates a one-parameter family of conformally Fisher metrics, dual connections, Laplacians, and canonical priors. The metric tensor is $g^{(\rho)} = \rho g^F$ , and the dual connections interpolate between exponential and mixture connections according to $\rho$ , with associated curvature properties. Rényi geometry is not symmetric in $\rho$ and $1-\rho$ , contrasting with Amari's $\alpha$ -geometry (Kuntz et al., 22 May 2025).
Geometry interpolating optimal transport and information, such as the entropy-regularized Wasserstein framework, which produces a continuous family of divergences and dual connections on the simplex, interpolating between the classic Fisher–KL and Wasserstein geometric regimes, with corresponding interpolation of metrics, connections, and geodesics (Amari et al., 2017, Wong et al., 2019).

Geometry	Metric	Dual Connections	Canonical Priors
Amari $\alpha$	$g^F$ (Fisher–Rao)	$(1-\alpha)/2$ , $(1+\alpha)/2$	$(\det g^F)^{(1\mp\alpha)/2}$
Rényi ( $\rho$ )	$\rho g^F$	$\rho$ , $1-\rho$	$\rho^{n/2}(\det g^F)^{\rho}$ , $\rho^{n/2}(\det g^F)^{1-\rho}$

Canonical prior volume forms (covolumes) are defined as those parallel to the respective connections ( $\nabla^{(\rho)}$ , $\nabla^{*(\rho)}$ ); in particular, the family of Hartigan's priors coincides with the Rényi-covolumes upon identification $\alpha_H = \rho$ (Kuntz et al., 22 May 2025).

5. Canonical Divergence and Recovery of Geometric Structure

For any dualistic statistical manifold with metric $g$ and dual connections, the divergence can be constructed directly from the geometry: $D(p, q) = \int_0^1 \langle \Pi_t(p), \dot\sigma(t) \rangle_{\sigma(t)}\, dt$ where $\sigma$ is the $\nabla$ -geodesic from $p$ to $q$ and $\Pi_t(p)$ is the parallel transport of the generalized “difference vector” along the dual geodesic. This canonical divergence recovers the metric (as the leading quadratic term in its Taylor expansion) and the dual connection coefficients at cubic order. In the flat case, this coincides with the standard Bregman divergence; in Riemannian or symmetric cases, it reduces to the half squared distance or spring-work divergence, respectively (Felice et al., 2018).

6. Applications: Quantum Fields, Markov Chains, and Model Selection

Quantum fields: Information geometry extends to infinite-dimensional functional settings, with divergence functionals and associated Fisher metrics defined on manifolds of field probability laws. Functional derivatives of the divergence generate connected and 1PI correlators. Sanov's theorem generalizes to rate functions for empirical fields, governed by the functional KL divergence (Floerchinger, 2023).
Markov chains: The exponential family of irreducible Markov chains inherits the dually flat information-geometric structure and all associated projection, Pythagorean, and Legendre duality theorems, with the divergence representing the per-step KL divergence rate (Nagaoka, 2017).
Computational geometry and the simplex: Embedding models into a high-dimensional probability simplex via discretization enables computational realization of information-geometric algorithms, model selection via geometric criteria, and uncertainty quantification through curvature and volume calculations in the induced metric (Anaya-Izquierdo et al., 2012).

7. Conformal, Pseudo-Riemannian, and Energetic Interpretations

Physical space-time as a statistical manifold: Assigning probability distributions to spatial points, treated as “blurred,” induces a Fisher–Rao metric on physical space that is only determined up to conformal factor, matching the conformal freedom in the geometric formulation of general relativity. Thus, information-theoretic considerations select precisely the conformal class of spatial metrics required in canonical gravity (Caticha, 2015).
Large deviation perspective: The central large deviation rate functions (Shannon entropy, relative entropy) arise as geometric objects via Legendre–Fenchel duality, with empirical means forming a dually flat manifold parametrized by natural and expectation coordinates. The projection and energetic contraction operations correspond to entropy maximizers and minimal rate-function solutions (Muppirala et al., 2 Jan 2025).
Pseudo-Riemannian embedding: The optimal transport geometry, encapsulated in the signature- $(n,n)$ Kim–McCann metric, displays dualistic statistical structures (metric + dual connections), interprets regularity conditions (e.g., Ma–Trudinger–Wang tensor) as curvature, and encompasses classical Bregman/Fisher and constant-curvature geometries as special cases (Wong et al., 2019).

The field of extended information geometry thus systematically extends divergence-induced geometric structures to include non-statistical, quantum, conditional, compositional, and optimal-transport systems, allowing for the practical and conceptual unification of statistical inference, large deviations, information processing, and modern applications in machine learning and quantum theory. Key results include universality of divergence-based metrics, explicit classes of dual connections (with explicit cubic bias tensors), classification and parameterization of uniform priors via geometric covolume, and exact correspondences between statistical, energetic, and estimation-theoretic structures across diverse domains [(Naudts et al., 2015); (Muppirala et al., 2 Jan 2025); (Kuntz et al., 22 May 2025); (Caticha, 2015); (Matsuda et al., 2020); (Lebanon, 2012); (Nagaoka, 2017); (Floerchinger, 2023); (Felice et al., 2018); (Anaya-Izquierdo et al., 2012)].