Fisher–Rao Distance Overview
- Fisher–Rao distance is a canonical Riemannian metric defined via the Fisher information matrix that quantifies the minimal statistical length between probability distributions.
- It underpins diverse applications in information geometry, clustering, and unbalanced optimal transport by providing geodesic insights on statistical manifolds.
- Robust numerical approximations and closed-form expressions for standard families enable its use in high-dimensional models and extend to functional data and copula settings.
The Fisher–Rao distance is a canonical Riemannian geodesic distance derived from the Fisher information metric on the parameter space of statistical families or, more generally, on the cone of non-negative measures. It plays a central role in information geometry, statistical inference, clustering of distributions, functional data analysis, and unbalanced optimal transport theory. The Fisher–Rao distance quantifies the minimal “statistical” length between probability distributions, taking into account the information structure of the model and its symmetries.
1. Foundations: Definition and Geometric Characterization
Given a regular parametric family , the Fisher information matrix is defined as
Equipped with this matrix, becomes a Riemannian manifold , and the Fisher–Rao distance is the geodesic length between and : In the simplest one-parameter case, the Fisher–Rao distance reduces to an explicit integral along the parameter axis: For arbitrary (possibly singular) non-negative measures 0 on 1, the Fisher–Rao distance is characterized by the "square-root" formula: 2 for any reference measure 3 (e.g., Lebesgue) dominating both 4. This induces an isometric embedding into the positive cone of 5 through 6, with geodesics given by
7
Such geometry underpins both discrete and continuous statistical models (Chizat et al., 2015, Mielke, 2 Oct 2025).
2. Dynamical and Interpolating Formulations
The Fisher–Rao metric arises as a special (“pure reaction”) case of the Benamou–Brenier dynamical formulation for optimal transport, where only creation/annihilation (no spatial transport) is allowed. The dynamic equation becomes: 8 with action
9
The minimizer recovers the explicit “square-root” geodesic above. More generally, the interpolating Wasserstein–Fisher–Rao metric introduces a source term alongside the transport term: 0 with action
1
which interpolates between Wasserstein and Fisher–Rao distances as 2 and 3, respectively. This framework enables the unbalanced optimal transport (Wasserstein–Fisher–Rao or Hellinger–Kantorovich metric), allowing comparison of measures with different masses (Chizat et al., 2015, Wang et al., 2019).
3. Explicit Forms for Standard Families
Closed-form Fisher–Rao distances are known for key statistical models:
| Model | Parameter(s) | Fisher–Rao Distance |
|---|---|---|
| Binomial (4) | 5 | 6 |
| Poisson (7) | 8 | 9 |
| Categorical, simplex (0) | 1, 2 | 3 |
| Gaussian (4) | 5, 6-dim SPD | 7, 8 eigs of 9 |
| Univariate location–scale ellip. | 0 | 1 |
| Multivariate normal 2 | 3 | 4 |
These formulas generalize to certain copula families (e.g., Gaussian copulas, with the same covariance-cone metric), elliptical models, and Wishart distributions (Miyamoto et al., 2023, Wells et al., 2020, Marti et al., 2016).
In the case of categorical distributions or the unit simplex, the Fisher–Rao metric corresponds to the geodesic (great-circle) distance on the positive orthant of the sphere under the square-root embedding: 5 (Picot et al., 2021, Miyamoto et al., 2023).
4. Group Invariance, Maximal Invariants, and Reduction
The Fisher–Rao metric is preserved under natural group actions on both sample and parameter spaces (“transformation models”); such symmetries induce invariance of the geodesic distance (Nielsen et al., 24 Jun 2026, Nielsen, 2024). Any 6-divergence, including Fisher–Rao, is constant on orbits under these group actions. For location–scale and affine models, the Fisher–Rao distance depends only on maximal invariants, reducing to a double-coset structure:
- For the multidimensional Gaussian 7, the Fisher–Rao distance can be written in terms of the singular values of 8 and the transformed displacement 9: 0 where 1 are the eigenvalues of 2 (Nielsen et al., 24 Jun 2026, Wells et al., 2020). This reduction ensures that the Fisher–Rao metric is affine-invariant; it suffices to analyze canonical forms under group actions.
5. Numerical Approximations and Bounding Techniques
In most multivariate or high-dimensional settings, closed-form expressions for the Fisher–Rao distance are unavailable. Robust, theoretically justified approximation and bounding techniques have been developed (Nielsen, 2024, Nielsen, 2023, Nielsen, 2023):
- Curve discretization: Approximate the true geodesic by subdividing a path (e.g., the mixture or exponential curve) and summing square roots of local Jeffreys or Bregman divergences.
- Lower and upper bounds: The Calvo–Oller isometric embedding of the 3-variate normal manifold into a higher-dimensional SPD cone yields a closed-form lower bound; upper bounds utilize Fisher–Manhattan (sum-of-1D), triangle-inequality, or symmetrized Bregman divergences.
- Proxy distances: Pullback of Hilbert or Birkhoff projective cone distances offers fast computation with controlled deviation from the true Fisher–Rao (Nielsen, 2023).
- Guaranteed error control: Recursive bracketing and adaptive discretization schemes provide additive or multiplicative error guarantees for the numerical Fisher–Rao computation (Nielsen, 2024).
In practical applications, algorithms discretize geodesic paths or leverage proximal splitting techniques (for unbalanced OT/Fisher–Rao models) (Chizat et al., 2015, Nielsen, 2023). For multivariate normals, local pairwise segments are approximated via 4, the square root of the Jeffreys (symmetrized KL) divergence, with convergence to the exact geodesic length as the discretization fineness increases (Nielsen, 2023, Nielsen, 2023).
6. Extensions: Functional Data, Truncated Laws, and Other Settings
The Fisher–Rao metric and distance extend beyond finite-dimensional exponential families:
- Functional Data: For real-valued absolutely continuous functions, the Fisher–Rao metric on the space of curves 5 is
6
and, via the square-root velocity function transform, the Fisher–Rao geodesic distance simplifies to the 7 metric on SRVFs. This underlies time-warping-invariant registration and mean template computation (Srivastava et al., 2011).
- Truncated Distributions: For families such as truncated normals, the Fisher metric and geodesic equations adjust to include derivatives of the normalization constant with respect to parameters, often requiring semi-analytic or numerical computation. The corresponding geodesic ODE is solved numerically, enabling robust uncertainty quantification under parameter constraints (Ketema et al., 2024).
- Copulas and Dependence Modeling: The Fisher–Rao distance on Gaussian or elliptic copulas captures dependence structure and is used in multivariate time series clustering. Its sensitivity near the boundary (e.g., 8 for Gaussian copulas) is both a theoretical feature and a practical limitation, motivating comparison with transport-based metrics (Marti et al., 2016).
- Unbalanced Optimal Transport: The Fisher–Rao metric functions as the growth-decay penalty in the dynamical optimal transport equation with source term. The WFR metric and related Sinkhorn-based solvers allow its use in scalable document distance, clustering, and cross-domain matching (Wang et al., 2019, Chizat et al., 2015).
7. Relations to Other Distances and Applications
The Fisher–Rao metric is intimately connected to other information-theoretic divergences:
- Hellinger Distance: On the simplex or cone of measures, Fisher–Rao distance coincides (up to scaling) with the Hellinger distance. Their geodesics coincide in the infinite-dimensional positive cone (measures), while on statistical subfamilies, the Fisher–Rao distance is at least as large as the Hellinger distance, equaling it when the parametric submanifold is totally geodesic (Mielke, 2 Oct 2025, Chizat et al., 2015).
- KL and Jeffreys Divergence: For small parameter separation, the Fisher–Rao distance squared matches the local KL divergence up to second order, and the symmetrized Jeffreys divergence provides a tight quadratic upper bound and efficient means for approximation (Nielsen, 2024, Nielsen, 2023).
- Wasserstein Distance: Contrasts with optimal-transport-based metrics are significant: the Fisher–Rao metric exhibits nonpositive sectional curvature and extreme sensitivity to high correlation or near-degenerate densities, while the Wasserstein metric is globally defined and has more regular discrimination behavior (Marti et al., 2016).
Practical applications include adversarial robustness via Fisher–Rao regularization in neural networks (using closed formulae for categorical models) (Picot et al., 2021), functional alignment in signal and shape analysis (Srivastava et al., 2011), robust UQ under parametric perturbations (Ketema et al., 2024), and computational tools for distribution clustering and geometry-aware machine learning (Nielsen, 2023, Wang et al., 2019, Chizat et al., 2015).
References:
(Chizat et al., 2015, Wells et al., 2020, Mielke, 2 Oct 2025, Picot et al., 2021, Nielsen, 2023, Miyamoto et al., 2023, Nielsen, 2024, Srivastava et al., 2011, Nielsen, 2023, Marti et al., 2016, Ketema et al., 2024, Nielsen et al., 24 Jun 2026, Florin, 20 May 2025, Wang et al., 2019)