Fisher–Rao-Geodesic Optimal Schedule
- Fisher–Rao-Geodesic Optimal Schedule is a method of time-parametrizing probability paths at constant speed using the Fisher–Rao metric to minimize statistical energy.
- It leverages closed-form expressions and ODE-based approaches to derive geodesics across various probability spaces, including simplices and manifolds.
- Applications span masked discrete diffusion, exponential family models, and non-commutative measures, enhancing efficiency in statistical and computational tasks.
A Fisher–Rao–Geodesic Optimal Schedule is the time-parametrization of a probability path (or parameter evolution) that corresponds to constant-speed motion along the geodesic in the information geometry induced by the Fisher–Rao metric. Such schedules are fundamental in models where probability distributions, parameters, or densities evolve according to principles that minimize information-geometric energy, ensuring each infinitesimal update is optimally efficient in terms of statistical length or divergence.
1. Formal Definition and Geometric Framework
The Fisher–Rao metric is the canonical Riemannian metric on spaces of probability distributions or positive densities, determining the statistical length of paths in these spaces. Given a smoothly parametrized family of densities or distributions, the Fisher–Rao geodesic is the curve between two fixed endpoints that minimizes
where denotes the Fisher–Rao inner product at . The optimal schedule is the specific parametrization of such that the Fisher–Rao "speed" is constant, i.e., the path traverses the minimal statistical arc-length at uniform rate.
On sample spaces such as the probability simplex, positive smooth densities on manifolds, product families (e.g., Beta distributions), or even spaces of matrix-valued densities, the Fisher–Rao metric yields closed-form expressions or system of ODEs for geodesics and their optimal schedules (Bruveris et al., 2016, Monsaingeon et al., 2020, Brigant et al., 2019).
2. Closed-Form Schedules in the Probability Simplex and Manifolds
On the open -simplex , the Fisher–Rao geodesics between are explicitly given using the Hellinger embedding as
with the Fisher–Rao metric
This path is the unique constant-speed geodesic (optimal schedule) joining and (Ciaglia et al., 2016). Analogous formulas apply to scalar densities or smooth positive measures on manifolds (Bruveris et al., 2016). There, the Fisher–Rao metric is pulled back via the square-root density map, and the geodesic becomes a "great circle" interpolation between initial and final densities.
In parameter spaces of exponential families, such as the Beta family, optimal Fisher–Rao schedules are determined by integrating the geodesic ODEs derived from the metric's Christoffel symbols. For , , the geodesic between them, parameterized at constant Fisher–Rao speed, is obtained by integrating
and reparametrizing to constant arc-length (Brigant et al., 2019).
3. Application to Masked Discrete Diffusion: The Cosine Schedule
In masked discrete diffusion models with factorized Markovian forward processes, the probability path for data vectors (with mask symbol ) induces a 1D statistical manifold. The scalar Fisher information is
where encodes the probability of not being masked up to time . The Fisher–Rao-geodesic optimal schedule is then obtained by extremizing the arc-length functional (via calculus of variations), yielding the differential condition
whose solution, after integration, leads to the cosine schedule:
ensuring constant Fisher–Rao speed along the diffusion path (Zhang, 6 Aug 2025). Linear or quadratic schedules do not equalize the Fisher–Rao increments and are therefore suboptimal.
4. Fisher–Rao Geodesics for General Measures and Extensions
Bures–Fisher–Rao geodesics for scalar or matrix-valued measures generalize the above notions. For scalar nonnegative measures, one obtains
with the pointwise "growth" or schedule
so that the Fisher–Rao schedule evolves the density at uniform statistical speed (Chizat et al., 2015). In the non-commutative (matrix-valued) case, the closed-form geodesic is a normalized linear-in- path, reparametrized so that the -norm of the velocity field remains constant (Monsaingeon et al., 2020).
A significant extension is the Fisher–Rao–Wasserstein (FRW) metric, interpolating classical mass-preserving optimal transport with Fisher–Rao creation/annihilation. Here, the optimal schedule jointly evolves velocity and growth fields, again by enforcing constant metric speed along the path (Chizat et al., 2015).
5. Computational Aspects and Numerical Construction
Closed-form schedules are available in special cases (simplex geodesics, masked discrete diffusions, beta families). For general parameterizations or for coupled metrics (e.g., LDDMM–Fisher–Rao metrics for varifolds, Fisher–Rao–Wasserstein geodesics), one numerically integrates boundary-value ODE systems, applying shooting or collocation methods. Reparametrization by cumulative arc-length ensures constant Fisher–Rao speed and optimality (Brigant et al., 2019).
For registration and matching in geometric measure spaces (e.g., varifolds under LDDMM–Fisher–Rao), optimal schedules are determined by solving Hamiltonian systems with Fisher–Rao controls, using numerical shooting algorithms, back-propagation through the ODE system, and efficient kernel summation (Hsieh et al., 2021).
6. Extensions and Significance
The optimal schedule concept generalizes across information-geometric metrics. For other choices (Wasserstein, Sobolev, etc.), analogous geodesic equations lead to respective optimal schedules. In practical settings (e.g., discretized diffusion, image registration, non-commutative probability measures), Fisher–Rao–geodesic schedules provide not only theoretical efficiency but empirically improved fidelity to underlying data or model constraints. The cosine schedule's empirical superiority for diffusion sampling stems directly from its Fisher–Rao-geodesic optimality (Zhang, 6 Aug 2025).
The uniqueness and existence of these geodesics are often guaranteed by strict negative sectional curvature (as in the beta manifold (Brigant et al., 2019)) or by convexity of the variational functional (as in FRW and LDDMM–Fisher–Rao frameworks). The Fisher–Rao–geodesic optimal schedule thus unifies diverse applications where optimal transport, entropy minimization, and statistical efficiency intersect.