Principal Geodesic Analysis

Updated 2 March 2026

Principal Geodesic Analysis is a framework that extends PCA to nonlinear manifolds by replacing linear subspaces with geodesic flows.
It employs Riemannian geometry tools like exponential and logarithmic maps, Fréchet means, and tangent-space eigen-decomposition to extract principal directions.
PGA preserves intrinsic geometric structures for data on spheres, SPD matrices, and Grassmannians, enabling accurate and reliable dimensionality reduction.

Principal Geodesic Analysis (PGA) generalizes classical Principal Component Analysis (PCA) to data constrained to nonlinear manifolds, replacing Euclidean linear subspaces with geodesic submanifolds that respect the intrinsic geometry of the data domain. Unlike PCA, which projects data onto lines or planes in a vector space, PGA utilizes the exponential and logarithmic maps of Riemannian geometry to linearize manifold-valued data around their intrinsic mean, compute variance-maximizing directions in the tangent space, and map those directions back to the manifold as geodesic flows. This technique is essential for data types that naturally reside on manifolds—such as spheres, Grassmannians, symmetric positive-definite (SPD) matrices, shape spaces, and spaces of probability measures—and provides dimensionality reduction that preserves the underlying geometric structure, thus avoiding the distortions inherent in Euclidean projections (Ichi et al., 5 Feb 2026, Rodríguez, 30 May 2025).

1. Core Mathematical Framework

Let $(M,g)$ denote a complete $d$ -dimensional Riemannian manifold with metric $g$ . Given $N$ data points $p_i \in M$ :

Fréchet (Karcher) Mean: The intrinsic mean $\bar{p}$ is defined as

$\bar{p} = \arg\min_{p \in M} \sum_{i=1}^N d_M^2(p, p_i),$

where $d_M$ is the geodesic distance.

Logarithm Map to Tangent Space: Data points are mapped to the tangent space at $\bar{p}$ via

$v_i = \operatorname{log}_{\bar{p}}(p_i) \in T_{\bar{p}}M,$

where $d$ 0 is the inverse of the exponential map near $d$ 1 (i.e., it solves $d$ 2).

Covariance Operator: Compute the empirical covariance matrix in $d$ 3:

$d$ 4

Principal Geodesic Directions: Eigen-decompose $d$ 5. The leading eigenvectors $d$ 6 define orthonormal principal geodesic directions.
Projection and Reconstruction: The low-dimensional reconstruction of $d$ 7 is

$d$ 8

where the $d$ 9 are PGA scores (Ichi et al., 5 Feb 2026, Rodríguez, 30 May 2025).

This workflow is the direct Riemannian generalization of linear PCA, automatically reducing to the classical case when $g$ 0.

2. Riemannian Geometry Primitives

Geodesic Distance: $g$ 1 is the length of the shortest path joining $g$ 2, computed as the integral of the norm of the velocity vector along a geodesic curve.
Exponential Map: $g$ 3 maps a tangent vector to a point on $g$ 4 via geodesic flow.
Logarithm Map: $g$ 5 is defined locally as the inverse of the exponential map.
Fréchet Mean Uniqueness: In negatively curved spaces and compact manifolds, the Fréchet mean is unique.
Parallel Transport: In more advanced variants (e.g., for subspace deflation), parallel transport moves vectors along geodesics without changing their norm or inner product, providing tangent-space consistency.

Closed-form expressions for $g$ 6 and $g$ 7 are available for canonical cases: spheres ( $g$ 8), SPD matrices under the affine-invariant (AIRM) metric, and Grassmannians (Ichi et al., 5 Feb 2026, Giovanis et al., 2024).

3. Algorithmic and Computational Aspects

PGA’s computational pipeline involves iterative steps:

Step	Operations per iteration	Typical Complexity
Fréchet mean	$g$ 9 log-maps, one exp-map update	$N$ 0
Log-mapping data	$N$ 1 log-maps	$N$ 2
Tangent covariance	Forming $N$ 3 outer products in $N$ 4	$N$ 5
Eigen-decomposition	Tangent covariance eigendecomposition	$N$ 6

For the hypersphere $N$ 7:

$N$ 8
$N$ 9

For SPD matrices:

$p_i \in M$ 0
$p_i \in M$ 1

The overall complexity is typically $p_i \in M$ 2, where $p_i \in M$ 3 is the mean-iteration count. The dominant costs arise in high dimensions and for expensive log/exp evaluations (Ichi et al., 5 Feb 2026).

4. Exact vs. Linearized PGA and Curvature Effects

Two broad classes of algorithms are prevalent:

Linearized (Tangent-Space) PGA: Applies PCA to $p_i \in M$ 4 in $p_i \in M$ 5 and maps principal directions back to $p_i \in M$ 6 via $p_i \in M$ 7. Accurate when data are concentrated within a convex normal neighborhood of $p_i \in M$ 8.
Exact PGA: Optimizes principal geodesic subspaces directly on $p_i \in M$ 9. For a 1D principal geodesic, searches for the geodesic through $\bar{p}$ 0 that minimizes total squared distance to all points, with each point orthogonally projected onto the candidate geodesic. Gradient and Hessian computation utilize Jacobi fields to account for manifold curvature (Sommer et al., 2010, Lazar et al., 2016, Chakraborty et al., 2016).

Curvature influences PGA outcomes:

In regions of high curvature or when data have large spread, linearized PGA deviates from "true" principal geodesics and may fail to capture dominant variation directions.
Taylor-expansion-based PGA analysis shows that for small-scale data, curvature minimally perturbs the principal direction; for large spread, PGA principal directions differ by up to $\bar{p}$ 1 in negative-curvature settings (Lazar et al., 2016).

Constant-curvature manifolds (e.g., $\bar{p}$ 2, hyperbolic spaces) admit closed-form expressions for principal geodesic projection, enabling efficient "exact" PGA (Chakraborty et al., 2016).

5. Generalizations and Extensions

(a) Riemannian PCA for Generic Local Metrics

The Riemannian PCA (R-PCA) paradigm extends PGA by first endowing data—without explicit manifold structure—with local Riemannian metrics derived from, e.g., a UMAP-induced local metric. This approach constructs a data-driven local geometry, enabling computation of Fréchet means and tangent-space covariances with locally defined distances, converging to population-level results as the sample increases (Rodríguez, 30 May 2025).

(b) Principal Geodesic Analysis in Non-classical Geometries

Wasserstein Geometry: GPCA on probability measures in the $\bar{p}$ 3-Wasserstein space replaces Euclidean barycenters and lines with Wasserstein barycenters and geodesics parameterized through optimal transport maps or diffeomorphisms, with closed-form solutions in the Gaussian case via the Bures–Wasserstein metric (Vesseron et al., 4 Jun 2025, Seguy et al., 2015, Bigot et al., 2013).
Phylogenetic Treespace: In tree-shape spaces (CAT(0) complexes), principal components are loci of weighted Fréchet means over the standard simplex, and projection algorithms exploit the unique-geodesic property of the metric space (Nye et al., 2016).
Grassmann Manifold: PGA on $\bar{p}$ 4 and related homogeneous manifolds uses computation of Fréchet means, mapping to tangent space, and extraction of geodesic submanifolds through eigen-decomposition (Giovanis et al., 2024, Yang et al., 2020).

(c) Probabilistic and Mixture Models

Probabilistic PGA (PPGA) and its mixture extensions (MPPGA, MBPGA) embed the principal geodesic framework within a latent variable model, combining mixture clustering, automatic relevance determination of geodesic directions, and maximum-likelihood estimation via EM algorithms, enabling robust modeling of multi-modal manifold data (Zhang et al., 2019).

6. Application Domains and Limitations

Application domains:

Directional statistics: Orientation data on spheres and rotations.
Shape analysis: Landmark-based configurations, planar and 3D shapes.
Covariance descriptors: SPD matrix-valued features in computer vision.
Probability measures: Distributional data modeling (e.g., images as distributions).
Mechanical systems: Analysis of trajectories on $\bar{p}$ 5, $\bar{p}$ 6, Lie groups.
Topological summaries: Merge trees, persistence diagrams.
Climate time series: Path-signatures viewed as points on principal Lie group or associated manifolds (Sugiura, 2023).

Limitations:

Computational cost increases sharply with data spread, ambient dimension, or curvature.
Validity of tangent-space linearization is restricted to neighborhoods inside the injectivity radius.
PGA is less effective for data exhibiting large, intrinsically nonlinear modes unless using exact or domain-specialized algorithms.
For complex manifold structures or non-geodesic data clusters, non-geodesic or nested manifold learning methods may outperform standard PGA (Yang et al., 2020).

7. Comparative Performance and Theoretical Guarantees

Method	Geometry	Strengths	Limitations
Tangent-PGA	Any manifold	Efficient, closed-form eigen-decomposition. Accurate for locally concentrated data.	Fails for data with large spread/strong curvature.
Exact PGA	Any manifold	True geodesic residual minimization, curvature-aware.	High computational complexity.
Constant-curvature PGA	$\bar{p}$ 7, $\bar{p}$ 8	Closed-form projections, efficient.	Only applicable on constant curvature spaces.
Wasserstein GPCA	$\bar{p}$ 9	Matches mass-transport geometry, closed-form for Gaussians, scalable via entropic regularization.	Computation of barycenters and projections is challenging for general measures.
Mixture/Bayesian PGA	Any	Captures multimodality, automatic dimensionality selection.	Requires careful hyperparameter setting, EM convergence not guaranteed to global optimum.

Consistency results ensure that as $\bar{p} = \arg\min_{p \in M} \sum_{i=1}^N d_M^2(p, p_i),$ 0, sample Fréchet means and principal geodesics converge to their population counterparts under mild regularity and concentration conditions (Rodríguez, 30 May 2025, Bigot et al., 2013).

References

"Dimensionality Reduction on Riemannian Manifolds in Data Analysis" (Ichi et al., 5 Feb 2026)
"Riemannian Principal Component Analysis" (Rodríguez, 30 May 2025)
"On the Wasserstein Geodesic Principal Component Analysis of probability measures" (Vesseron et al., 4 Jun 2025)
"An efficient Exact-PGA algorithm for constant curvature manifolds" (Chakraborty et al., 2016)
"Optimization over Geodesics for Exact Principal Geodesic Analysis" (Sommer et al., 2010)
"Principal component analysis and the locus of the Frechet mean in the space of phylogenetic trees" (Nye et al., 2016)
"Principal Geodesic Analysis in Director-Based Dynamics of Hybrid Mechanical Systems" (Gebhardt et al., 2022)
"Polynomial Chaos Expansions on Principal Geodesic Grassmannian Submanifolds for Surrogate Modeling and Uncertainty Quantification" (Giovanis et al., 2024)
"Scale and curvature effects in principal geodesic analysis" (Lazar et al., 2016)
"Mixture Probabilistic Principal Geodesic Analysis" (Zhang et al., 2019)
"Geodesic PCA in the Wasserstein space" (Bigot et al., 2013)

Principal Geodesic Analysis provides a unifying, geometry-respecting statistical framework for dimensionality reduction and exploratory analysis across manifold-valued data settings, combining rigorous mathematical foundations with methods tailored for a wide array of application domains.