Papers

Topics

Authors

Recent

View all

Assistant

AI Research Assistant

Well-researched responses based on relevant abstracts and paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses.

Gemini 2.5 Flash

Gemini 2.5 Flash 80 tok/s

Gemini 2.5 Pro 28 tok/s Pro

GPT-5 Medium 32 tok/s Pro

GPT-5 High 38 tok/s Pro

GPT-4o 125 tok/s Pro

Kimi K2 181 tok/s Pro

GPT OSS 120B 462 tok/s Pro

Claude Sonnet 4.5 35 tok/s Pro

2000 character limit reached

Riemannian Consistency Training (RCT)

Updated 4 October 2025

RCT is a geometric framework that enables ML models, especially generative ones, to preserve consistency with the intrinsic manifold using tools like the exponential map.
It formulates consistency losses with geodesic distances and covariant derivatives to accurately translate tangent space updates on non-Euclidean domains.
RCT's efficient loss approximations and theoretical equivalence with consistency distillation offer practical benefits in applications like vision, robotics, and molecular modeling.

Riemannian Consistency Training (RCT) is a geometric framework for training machine learning models—especially generative models and neural networks—on curved spaces (Riemannian manifolds) so that their outputs, intermediate dynamics, or learned representations maintain consistency with the intrinsic manifold structure. RCT generalizes classical consistency objectives to settings in which both parameter spaces and data domains are non-Euclidean, requiring differential geometric tools such as exponential maps, covariant derivatives, and geometry-respecting loss functions.

1. Formulation of Consistency on Manifolds

A core challenge addressed by RCT is the generation or transformation of objects (e.g., samples, labels, hidden states) that inherently reside on a Riemannian manifold $\mathcal{M}$ . Classical consistency training in Euclidean domains enforces agreement across multiple model outputs or trajectories (as in diffusion, flow matching, or denoising) using standard addition and Euclidean distances. On curved geometries, however, the lack of global vector addition and the necessity to translate vectors between different tangent spaces require the replacement of these operations with manifold-specific constructs.

The essential parameterization in RCT is the use of the Riemannian exponential map to encode a "residual" or update in a tangent space $T_x\mathcal{M}$ as

$f_\theta(x_t, t) = \exp_{x_t}\left(\kappa_t \cdot v_\theta(x_t, t)\right),$

where $x_t \in \mathcal{M}$ is a state along a probability flow ODE/trajectory, $\kappa_t$ is a time-dependent scalar schedule (with $\kappa_1 = 0$ to enforce identity consistency at endpoint), and $v_\theta(x_t, t)$ is a learnable vector field defined in $T_{x_t}\mathcal{M}$ (Cheng et al., 1 Oct 2025).

Consistency losses are formulated using geodesic distances $d_g(\cdot, \cdot)$ , ensuring that the divergence between model-predicted states at different times reflects the actual geometry of $\mathcal{M}$ . This is crucial for tasks such as generative modeling on spheres, tori, or rotation groups.

2. Training Objectives and Geometric Consistency Loss

The RCM (Riemannian Consistency Model) loss for RCT extends standard consistency losses by taking into account Riemannian geometry at each optimization step. In the discrete-time form with $N$ steps,

$\mathcal{L}_{\mathrm{RCM}}^N = N^2 \cdot \mathbb{E}_{t, x_t}\left[w_t \cdot d_g^2\left(f_\theta(x_t, t), f_{\theta^-}(x_{t+\Delta_t}, t+\Delta_t)\right)\right],$

where $\theta^-$ denotes a stop-gradient operation or a teacher model, and $w_t$ weights the importance of each time-step. As $N \to \infty$ , the loss converges to a continuous-time variant: $\mathcal{L}_{\mathrm{RCM}}^\infty = \mathbb{E}_{t, x_t}\left[w \cdot \left\| d(\exp_{x})_u\left(\tfrac{d\kappa}{dt} v + \kappa \nabla_{\dot{x}} v\right) + d(\exp u)_x(\dot{x}) \right\|_g^2\right],$ where $u = \kappa v$ , $d(\exp_{x})_u$ and $d(\exp u)_x$ are differentials of the exponential map with respect to the tangent vector and base point, and $\nabla_{\dot{x}} v$ is the covariant derivative of the vector field along the trajectory (Cheng et al., 1 Oct 2025).

The loss can be interpreted as the deviation from "geodesic consistency": the model is penalized whenever its predictions deviate from the manifold's parallel transport of the consistency field. For flat manifolds (e.g., tori), the loss simplifies because the exponential map is globally linear; for curved spaces, accounting for curvature via covariant derivatives and proper tangent space translation is essential.

3. RCD–RCT Equivalence and Vector Field Consistency

RCT is typically instantiated in two variants:

Riemannian Consistency Distillation (RCD): Relies on a teacher Riemannian flow matching model providing the marginal vector field $\dot{x}$ .
Riemannian Consistency Training (RCT): Directly leverages the conditional vector field $\dot{x}|x_1$ , conditioned on the data endpoint.

A central theoretical result demonstrates equivalence between these variants, enabled by the linear relationship between the change in the denoiser prediction and the vector field. As formalized, given

$\dot{y} = d(\exp_{x})_u\left( \frac{d\kappa}{dt} v + \kappa \nabla_{\dot{x}} v \right) + d(\exp u)_x(\dot{x}),$

$\dot{y}$ is linear in $\dot{x}$ , so substituting the (conditional) expected value preserves the loss function: $\dot{x} = \mathbb{E}[\dot{x}|x_1 \mid x_t], \quad \Rightarrow \quad \mathcal{L}_{\mathrm{RCT}} \equiv \mathcal{L}_{\mathrm{RCD}}.$

Thus, the expensive step of separately training a flow-matching teacher can be circumvented, and RCT can use the conditional vector field directly with guaranteed equivalence (Cheng et al., 1 Oct 2025).

4. Simplified Losses and Computational Reductions

Evaluating the full RCT loss involves differentiating the exponential map (often requiring cumbersome symbolic computations, especially for nontrivial manifolds). To address this, a simplified loss $\mathcal{L}_{\mathrm{sRCM}}^\infty$ is proposed, relying on the approximation $d(\exp_{x})_u \approx d(\exp u)_x$ , which is exact for symmetric exponential maps (e.g., on flat tori): $\mathcal{L}_{\mathrm{sRCM}}^\infty = \mathbb{E}_{t, x_t}\left[w \cdot \| \dot{x} + \frac{d\kappa}{dt} v + \kappa \nabla_{\dot{x}} v \|_g^2\right].$ Alternatively, an inner product formulation is used: $\mathcal{L}_{\mathrm{sRCM}}^\infty = \mathbb{E}_{t, x_t}\left[ w \langle v_{\theta^-} - v_\theta + \dot{y}_{\theta^-}, \dot{y}_{\theta^-} \rangle_g \right],$ where $\dot{y}_{\theta} := \dot{x} + \frac{d\kappa}{dt} v_\theta + \kappa \nabla_{\dot{x}} v_\theta$ . This computationally tractable form is particularly suited to large-scale implementation and achieves the same geometric objectives (Cheng et al., 1 Oct 2025).

5. Geometric Kinematics and Interpretation

RCT's loss can be interpreted from a kinematics viewpoint: the training penalizes the "acceleration" of the model's predicted state along the probability flow trajectory. The total change of the denoised output $f_\theta(x_t, t)$ decomposes into three geometric components:

Discrepancy between predicted ( $v_\theta$ ) and reference (e.g., teacher or PF-ODE) vector fields.
Intrinsic vector field variation along the flow, governed by the time derivative and schedule $\frac{d\kappa}{dt}$ .
Extrinsic change induced by the manifold's curvature, captured by the covariant derivative term $\kappa \nabla_{\dot{x}} v$ .

Minimizing the RCT loss establishes a state of "infinitesimal equilibrium": if all three components vanish, predictions are geodesically consistent along the flow (Cheng et al., 1 Oct 2025). This ensures that the model does not just interpolate between fixed endpoints but learns curvature-aware updates that mimic the manifold's natural geometry.

6. Applications and Empirical Performance

RCT has demonstrated strong empirical results in generative modeling on a variety of manifolds, including the flat-torus $\mathbb{T}^d$ , spheres $\mathbb{S}^d$ , and the special orthogonal group $\mathrm{SO}(3)$ , where standard Euclidean approximation would fail to preserve crucial geometric properties. The use of a geometry-respecting exponential map parameterization and geodesic-based losses results in:

Improved sample quality in few-step generation regimes.
Exact satisfaction of manifold constraints (e.g., normalization, orthogonality, or rotational invariance).
Empirical gains over prior consistency modeling approaches that do not account for non-Euclidean geometry (Cheng et al., 1 Oct 2025).

A key advantage is the closed-form construction for both discrete- and continuous-time objectives, alongside computationally efficient approximations via simplified losses.

7. Broader Theoretical and Methodological Implications

RCT as formalized in the RCM framework (Cheng et al., 1 Oct 2025) provides the first closed-form, general approach for consistency-based generative modeling over Riemannian manifolds. The theoretical results—such as the equivalence of distillation and training objectives and the explicit decomposition of the loss into intrinsic and curvature-induced components—support extensions to (a) manifold-valued data in vision, robotics, and molecular modeling; (b) structured deep learning architectures built on Riemannian geometries; and (c) scenarios requiring few-step or one-step generation consistent with complex geometric constraints.

A plausible implication is that any future generative modeling over non-Euclidean domains can leverage RCT to ensure both geometric fidelity and sampling efficiency, replacing ad hoc projection or manifold constraint techniques previously used. This approach systematically integrates the tools of differential geometry (exponential maps, covariant derivatives) into the design of stochastic generative models, opening new directions for theory and application in geometric machine learning.

PDF Markdown Chat (Pro)

References (1)

Riemannian Consistency Model (2025)

Follow Topic

Get notified by email when new papers are published related to Riemannian Consistency Training (RCT).