Geodesic Loss Functions

Updated 26 February 2026

Geodesic loss functions are supervised learning objectives that measure distances along manifold geodesics rather than using Euclidean surrogates.
They have been effectively applied in rotation regression, protein docking, feature learning, and weight-space optimization to improve model robustness.
Efficient formulations with closed-form expressions and backpropagation-friendly gradients enable stable convergence in non-Euclidean settings.

Geodesic loss functions are a class of supervised learning objectives that explicitly encode non-Euclidean, Riemannian manifold structure into the loss landscape. These functions penalize discrepancies between predictions and targets by measuring distance along geodesics—shortest paths under an intrinsic metric—rather than relying on naïve Euclidean or angular surrogates. As a result, geodesic losses provide invariance to parameterization and respect the true geometry of the output or model’s latent space. They have been formulated and empirically validated in diverse settings, including rotation/rigid-body regression, deep classification features, neural weight space, structure-based protein docking, dimensionality reduction, multimodal representation alignment, and end-to-end articulated regression.

1. Mathematical Formulations of Geodesic Losses

Geodesic loss functions minimize the Riemannian distance between predicted and ground-truth quantities lying on a manifold $\mathcal{M}$ with metric $g$ . For a pair $p, q \in \mathcal{M}$ , the geodesic distance $d_{\rm geo}(p, q)$ is defined as the length of the shortest curve (geodesic) connecting $p$ and $q$ :

$d_{\rm geo}(p, q) = \inf_\gamma \int_0^1 \|\dot\gamma(t)\|_{g_{\gamma(t)}} dt$

where $\gamma(0) = p$ , $\gamma(1) = q$ .

Specialized Examples:

Rotation group $\mathrm{SO}(3)$ : For $R_1, R_2 \in \mathrm{SO}(3)$ , the angle of displacement,

$d_{\rm geo}(R_1, R_2) = \arccos \left( \frac{\operatorname{Tr}(R_1^\top R_2) - 1}{2} \right)$

or in quaternion representation,

$d_{\rm geo}(\mathbf{q}_1, \mathbf{q}_2) = 2\arccos\left(|\langle \mathbf{q}_1, \mathbf{q}_2 \rangle|\right)$

A quadratic surrogate used for deep learning is

$\mathcal{L}_{\mathrm{geo}}(\mathbf{q}_1, \mathbf{q}_2) = 4\left(1 - \langle \mathbf{q}_1, \mathbf{q}_2 \rangle^2 \right)$

(Lan et al., 20 Nov 2025, Salehi et al., 2018).

Hypersphere $S^{d-1}$ : For normalized features $z_i, z_j \in S^{d-1}$ ,

$d_{\rm geo}(z_i, z_j) = \arccos(\langle z_i, z_j \rangle)$

(Choi et al., 2020).

Weight-space manifolds: With the Gauss–Newton pullback metric $g$ , and a path $\gamma$ , the geodesic loss is the integrated action

$S[\gamma] = \int_0^1 \dot\gamma(t)^\top g(\gamma(t)) \dot\gamma(t) dt$

(Raghavan et al., 2021).

Rigid-body frames / $\mathrm{SE}(3)$ : The geodesic on $\mathrm{SE}(3)$ combines rotation and translation:

$d_\theta((R, \mathbf{t}), (R', \mathbf{t}'); \alpha) = \sqrt{d_{\rm geo}(R, R')^2 + \left\| \frac{\mathbf{t} - \mathbf{t}'}{\alpha} \right\|_2^2 }$

(Wu et al., 2024).

Geodesic paths in latent spaces: For dimensionality reduction, the loss matches geodesic distances in ambient and embedding space,

$L_{\rm geo} = \sum_{(i, j)} \frac{(|\rho_{\rm geo}^{(i, j)} - \Delta_{\rm geo}^{(i, j)}|)^2}{\Delta_{\rm geo}^{(i, j)} + \beta}$

with $\rho_{\rm geo}^{(i, j)}$ the path length in embedding space and $\Delta_{\rm geo}^{(i, j)}$ in data space (Euchner et al., 2024).

2. Core Applications and Empirical Outcomes

Rotation and Rigid Registration: Geodesic losses for $\mathrm{SO}(3)$ rotation regression provide bi-invariant, parameterization-free errors. In deep pose estimation and MoCap from rigid markers, replacing MSE with geodesic loss on axis–angle or quaternion representations dramatically improves angular accuracy, especially for large rotations, and yields stable gradients for end-to-end learning (Lan et al., 20 Nov 2025, Salehi et al., 2018).

Protein Structure and Docking: In protein complex modeling, the FAPE (Frame Aligned Point Error) loss can exhibit vanishing gradients for large rotational errors due to its chordal metric basis. The geodesic F2E (Frame-Aligned Frame Error) loss, operating on true group-theoretic distances in $\mathrm{SE}(3)$ , avoids this pathology, producing stable and effective optimization for hard targets, as evidenced by major increases in correct docking rates on antibody-antigen data (Wu et al., 2024).

Feature Learning and Classification: For deep feature learning, geodesic (angular) margin losses such as AMC-Loss on the hypersphere result in more compact class clusters and clearer boundaries. While quantitative accuracy gains are modest, the qualitative improvements in interpretability (e.g., Grad-CAM localization) and feature homogeneity are substantial (Choi et al., 2020).

Neural Weight-Space Geometry: In weight-space, geodesic losses permit the explicit construction of low-action paths connecting network states, enabling efficient model sparsification, mitigated catastrophic forgetting, and mode connectivity. Empirical studies demonstrate that these geodesic paths consistently maintain high classification accuracy, outperforming linear interpolation or standard fine-tuning across tasks and architectures (Raghavan et al., 2021).

Dimensionality Reduction and Channel Charting: Geodesic-to-geodesic losses allow neural channel charting models to faithfully capture complex, nonconvex spatial geometries from pairwise dissimilarities without explicit coordinates, overcoming distortion from Euclidean metrics. Incorporating uncertainty in geodesic pseudo-distances further improves robustness and accuracy in localization (Euchner et al., 2024).

Multimodal Representation Alignment: In CLIP-guided image morphing, geodesic cosine similarity measured along the geodesic flow between projected subspaces in feature space rectifies instabilities of naïve CLIP directional losses, yielding improved stability, plasticity, and photorealistic transformations (Oh et al., 2024).

3. Implementation Details and Differentiability

Most geodesic losses are efficiently expressible in closed form and permit back-propagation with minimal modification to standard frameworks:

SO(3) rotation losses are computed with trace–arccos or quaternion inner product formulas; the quadratic surrogate $4(1 - \langle \mathbf{q}_1, \mathbf{q}_2 \rangle^2)$ is preferred for numerical stability, with gradients handled via automatic differentiation (Lan et al., 20 Nov 2025, Salehi et al., 2018).
Hyperspherical geodesic losses use $\arccos$ of the feature inner product, with chain rule derivatives for backpropagation (Choi et al., 2020).
Weight-space geodesics require pullback metric Jacobians and iterative quadratic subproblems due to infeasibility of direct Christoffel symbol computation at scale (Raghavan et al., 2021).
Protein docking losses in $\mathrm{SE}(3)$ are differentiable without SVD or Procrustes steps, using direct closed-form relative frame constructions (Wu et al., 2024).
Channel charting geodesic losses involve shortest-path computation for geodesics over sparse graphs and efficient batched summing of latent path segment lengths (Euchner et al., 2024).
CLIP subspace geodesic losses necessitate low-rank PCA and SVD computations for Grassmannian interpolation, then reduce to quadratic forms in the projected space (Oh et al., 2024).

4. Theoretical and Empirical Advantages

Geodesic loss functions offer several essential properties:

Metric invariance and coordinate-independence: Errors depend only on the intrinsic relative relationship of predictions and targets, not on input parameterization.
Improved stability and convergence: In contrast to MSE or chordal losses, geodesic penalties avoid “over-penalizing” or “under-penalizing” near antipodal or wrap-around regions of the manifold; gradients do not vanish or explode at large manifold distances (Salehi et al., 2018, Wu et al., 2024).
Preservation of task and semantic structure: Across weight-space, latent space, or feature space, geodesic losses enforce consistency that is compatible with the true geometry of the data or model, enhancing robustness in transfer, continual learning, and cross-modal morphing (Raghavan et al., 2021, Oh et al., 2024, Euchner et al., 2024).
Computational tractability: Most practical geodesic surrogates are compatible with first-order stochastic optimization and automatic differentiation (Lan et al., 20 Nov 2025, Choi et al., 2020, Salehi et al., 2018).

5. Representative Algorithms and Training Regimes

Application Domain	Manifold / Group	Geodesic Loss (Key Formula)
Rotation Regression	$\mathrm{SO}(3)$	$4(1 - \langle \mathbf{q}_1, \mathbf{q}_2 \rangle^2)$
Feature Learning	$S^{d-1}$	$[\arccos(\langle z_i, z_j\rangle)]^2$ with margin
Protein Docking	$\mathrm{SE}(3)$	$d_\theta(\hat{T}_{i\to j},T_{i\to j};\alpha)$
Channel Charting	Nonlinear Manifold	$\|\rho_{\rm geo}^{(i, j)} - \Delta_{\rm geo}^{(i, j)}\|^2 / (\Delta_{\rm geo}^{(i, j)} + \beta)$
Weight Space Optimization	$\mathbb{R}^n$ with $g$	$\int_0^1 \dot\gamma^\top g\dot\gamma dt$
CLIP Alignment	Grassmannian $\mathcal{G}(N, D)$	$1-\text{sim}_{\text{geo}}(z_t, z_{t+1})$

Training strategies include two-stage MSE-pretraining with geodesic refinement on $\mathrm{SO}(3)$ (Salehi et al., 2018); subsampled or annealed geodesic path enforcement in graph-based settings (Euchner et al., 2024); and dual-term (inter/intra) geodesic regularization for multimodal tasks (Oh et al., 2024). Efficient batching, careful normalization, and margin or weighting schedules are typical.

6. Limitations and Open Directions

While geodesic loss functions offer principled and empirically justified improvements, several constraints and open areas remain:

Computational complexity can arise in high-dimensional, densely connected manifolds (e.g., weight-space Christoffel symbol evaluation or all-pairs shortest-path) (Raghavan et al., 2021, Euchner et al., 2024).
Hyperparameter tuning (e.g., margin values, relative axis scales, annealing schedules) is often required to achieve optimal performance.
Numerical stability requires careful implementation when $\langle \cdot, \cdot \rangle \to \pm1$ or near antipodal points; surrogate polynomial losses are commonly adopted (Lan et al., 20 Nov 2025).
Generalization to arbitrary manifolds is less explored compared to classical groups, but manifold-aware loss design (e.g., on the Grassmannian or in learned subspaces) is an emerging research frontier (Oh et al., 2024).
A plausible implication is that as architectures become more manifold-structured or downstream tasks require alignment of heterogeneous latent factors, the future role of geodesic losses will intensify, motivating advances in scalable Riemannian optimization and adaptable metric learning.

7. Summary of Empirical Impact

Across domains, geodesic losses have demonstrated statistically significant reductions in error rates, improved robustness to large manifold displacements, superior semantic or physical fidelity, and qualitative advances in interpretability and controllability. Notably, in rotation estimation, switching from MSE to geodesic loss halves angular error; in protein docking, geodesic losses are essential for optimizing high-rotation configurations; in deep feature learning, angular margin geodesic losses yield more clusterable and interpretable representations; and in channel charting, geodesic-matched mapping is necessary for preserving nontrivial spatial topologies (Lan et al., 20 Nov 2025, Salehi et al., 2018, Wu et al., 2024, Choi et al., 2020, Euchner et al., 2024, Raghavan et al., 2021, Oh et al., 2024).