Fisher–Rao-Geodesic Optimal Schedule

Updated 12 February 2026

Fisher–Rao-Geodesic Optimal Schedule is a method of time-parametrizing probability paths at constant speed using the Fisher–Rao metric to minimize statistical energy.
It leverages closed-form expressions and ODE-based approaches to derive geodesics across various probability spaces, including simplices and manifolds.
Applications span masked discrete diffusion, exponential family models, and non-commutative measures, enhancing efficiency in statistical and computational tasks.

A Fisher–Rao–Geodesic Optimal Schedule is the time-parametrization of a probability path (or parameter evolution) that corresponds to constant-speed motion along the geodesic in the information geometry induced by the Fisher–Rao metric. Such schedules are fundamental in models where probability distributions, parameters, or densities evolve according to principles that minimize information-geometric energy, ensuring each infinitesimal update is optimally efficient in terms of statistical length or divergence.

1. Formal Definition and Geometric Framework

The Fisher–Rao metric is the canonical Riemannian metric on spaces of probability distributions or positive densities, determining the statistical length of paths in these spaces. Given a smoothly parametrized family $\{\mu_t\}_{t\in[0,1]}$ of densities or distributions, the Fisher–Rao geodesic is the curve between two fixed endpoints that minimizes

$E[\mu] = \frac{1}{2} \int_0^1 \langle \dot\mu_t, \dot\mu_t \rangle^{\mathrm{FR}}_{\mu_t} dt,$

where $\langle\cdot, \cdot\rangle^{\mathrm{FR}}_{\mu_t}$ denotes the Fisher–Rao inner product at $\mu_t$ . The optimal schedule is the specific parametrization of $t \mapsto \mu_t$ such that the Fisher–Rao "speed" is constant, i.e., the path traverses the minimal statistical arc-length at uniform rate.

On sample spaces such as the probability simplex, positive smooth densities on manifolds, product families (e.g., Beta distributions), or even spaces of matrix-valued densities, the Fisher–Rao metric yields closed-form expressions or system of ODEs for geodesics and their optimal schedules (Bruveris et al., 2016, Monsaingeon et al., 2020, Brigant et al., 2019).

2. Closed-Form Schedules in the Probability Simplex and Manifolds

On the open $(n-1)$ -simplex $\Delta^{n-1} = \{p_i > 0, \sum_i p_i = 1\}$ , the Fisher–Rao geodesics between $p^0,\,p^1$ are explicitly given using the Hellinger embedding as

$p_i(t) = \left[ (1-t)\sqrt{p^0_i} + t\sqrt{p^1_i} \right]^2,$

with the Fisher–Rao metric

$g_{ij}(p) = \delta_{ij}/p_i, \qquad \sum_i dp_i = 0.$

This path is the unique constant-speed geodesic (optimal schedule) joining $p^0$ and $p^1$ (Ciaglia et al., 2016). Analogous formulas apply to scalar densities $\rho_t(x)$ or smooth positive measures on manifolds $M$ (Bruveris et al., 2016). There, the Fisher–Rao metric is pulled back via the square-root density map, and the geodesic becomes a "great circle" interpolation between initial and final densities.

In parameter spaces of exponential families, such as the Beta family, optimal Fisher–Rao schedules are determined by integrating the geodesic ODEs derived from the metric's Christoffel symbols. For $(\alpha_0,\beta_0)$ , $(\alpha_1, \beta_1)$ , the geodesic between them, parameterized at constant Fisher–Rao speed, is obtained by integrating

$\ddot\theta^k + \Gamma^k_{ij}(\theta)\,\dot\theta^i\,\dot\theta^j = 0,\quad k=1,2,$

and reparametrizing to constant arc-length (Brigant et al., 2019).

3. Application to Masked Discrete Diffusion: The Cosine Schedule

In masked discrete diffusion models with factorized Markovian forward processes, the probability path $q_t(x)$ for data vectors $x_0 \in [m-1]^n$ (with mask symbol $m$ ) induces a 1D statistical manifold. The scalar Fisher information is

$I(t) = n \frac{\dot{\alpha}_t^2}{\alpha_t(1-\alpha_t)},$

where $\alpha_t$ encodes the probability of not being masked up to time $t$ . The Fisher–Rao-geodesic optimal schedule is then obtained by extremizing the arc-length functional (via calculus of variations), yielding the differential condition

$\phi'' + \frac{I'(\phi)}{2I(\phi)} (\phi')^2 = 0,$

whose solution, after integration, leads to the cosine schedule:

$\alpha(t) = \cos^2\left(\frac{\pi t}{2}\right),$

ensuring constant Fisher–Rao speed along the diffusion path (Zhang, 6 Aug 2025). Linear or quadratic schedules do not equalize the Fisher–Rao increments and are therefore suboptimal.

4. Fisher–Rao Geodesics for General Measures and Extensions

Bures–Fisher–Rao geodesics for scalar or matrix-valued measures generalize the above notions. For scalar nonnegative measures, one obtains

$\rho_t(x) = \left((1-t)\sqrt{\mu_0(x)} + t\sqrt{\mu_1(x)}\right)^2,$

with the pointwise "growth" or schedule

$r^*(t,x) = \frac{\partial_t \rho}{\rho} = 2\frac{\sqrt{\mu_1(x)}-\sqrt{\mu_0(x)}}{(1-t)\sqrt{\mu_0(x)} + t\sqrt{\mu_1(x)}},$

so that the Fisher–Rao schedule evolves the density at uniform statistical speed (Chizat et al., 2015). In the non-commutative (matrix-valued) case, the closed-form geodesic is a normalized linear-in- $\sqrt{G}$ path, reparametrized so that the $L^2$ -norm of the velocity field remains constant (Monsaingeon et al., 2020).

A significant extension is the Fisher–Rao–Wasserstein (FRW) metric, interpolating classical mass-preserving optimal transport with Fisher–Rao creation/annihilation. Here, the optimal schedule jointly evolves velocity and growth fields, again by enforcing constant metric speed along the path (Chizat et al., 2015).

5. Computational Aspects and Numerical Construction

Closed-form schedules are available in special cases (simplex geodesics, masked discrete diffusions, beta families). For general parameterizations or for coupled metrics (e.g., LDDMM–Fisher–Rao metrics for varifolds, Fisher–Rao–Wasserstein geodesics), one numerically integrates boundary-value ODE systems, applying shooting or collocation methods. Reparametrization by cumulative arc-length ensures constant Fisher–Rao speed and optimality (Brigant et al., 2019).

For registration and matching in geometric measure spaces (e.g., varifolds under LDDMM–Fisher–Rao), optimal schedules are determined by solving Hamiltonian systems with Fisher–Rao controls, using numerical shooting algorithms, back-propagation through the ODE system, and efficient kernel summation (Hsieh et al., 2021).

6. Extensions and Significance

The optimal schedule concept generalizes across information-geometric metrics. For other choices (Wasserstein, Sobolev, etc.), analogous geodesic equations lead to respective optimal schedules. In practical settings (e.g., discretized diffusion, image registration, non-commutative probability measures), Fisher–Rao–geodesic schedules provide not only theoretical efficiency but empirically improved fidelity to underlying data or model constraints. The cosine schedule's empirical superiority for diffusion sampling stems directly from its Fisher–Rao-geodesic optimality (Zhang, 6 Aug 2025).

The uniqueness and existence of these geodesics are often guaranteed by strict negative sectional curvature (as in the beta manifold (Brigant et al., 2019)) or by convexity of the variational functional (as in FRW and LDDMM–Fisher–Rao frameworks). The Fisher–Rao–geodesic optimal schedule thus unifies diverse applications where optimal transport, entropy minimization, and statistical efficiency intersect.