Papers
Topics
Authors
Recent
Search
2000 character limit reached

Principal Geodesic Analysis

Updated 2 March 2026
  • Principal Geodesic Analysis is a framework that extends PCA to nonlinear manifolds by replacing linear subspaces with geodesic flows.
  • It employs Riemannian geometry tools like exponential and logarithmic maps, Fréchet means, and tangent-space eigen-decomposition to extract principal directions.
  • PGA preserves intrinsic geometric structures for data on spheres, SPD matrices, and Grassmannians, enabling accurate and reliable dimensionality reduction.

Principal Geodesic Analysis (PGA) generalizes classical Principal Component Analysis (PCA) to data constrained to nonlinear manifolds, replacing Euclidean linear subspaces with geodesic submanifolds that respect the intrinsic geometry of the data domain. Unlike PCA, which projects data onto lines or planes in a vector space, PGA utilizes the exponential and logarithmic maps of Riemannian geometry to linearize manifold-valued data around their intrinsic mean, compute variance-maximizing directions in the tangent space, and map those directions back to the manifold as geodesic flows. This technique is essential for data types that naturally reside on manifolds—such as spheres, Grassmannians, symmetric positive-definite (SPD) matrices, shape spaces, and spaces of probability measures—and provides dimensionality reduction that preserves the underlying geometric structure, thus avoiding the distortions inherent in Euclidean projections (Ichi et al., 5 Feb 2026, Rodríguez, 30 May 2025).

1. Core Mathematical Framework

Let (M,g)(M,g) denote a complete dd-dimensional Riemannian manifold with metric gg. Given NN data points piMp_i \in M:

  1. Fréchet (Karcher) Mean: The intrinsic mean pˉ\bar{p} is defined as

pˉ=argminpMi=1NdM2(p,pi),\bar{p} = \arg\min_{p \in M} \sum_{i=1}^N d_M^2(p, p_i),

where dMd_M is the geodesic distance.

  1. Logarithm Map to Tangent Space: Data points are mapped to the tangent space at pˉ\bar{p} via

vi=logpˉ(pi)TpˉM,v_i = \operatorname{log}_{\bar{p}}(p_i) \in T_{\bar{p}}M,

where dd0 is the inverse of the exponential map near dd1 (i.e., it solves dd2).

  1. Covariance Operator: Compute the empirical covariance matrix in dd3:

dd4

  1. Principal Geodesic Directions: Eigen-decompose dd5. The leading eigenvectors dd6 define orthonormal principal geodesic directions.
  2. Projection and Reconstruction: The low-dimensional reconstruction of dd7 is

dd8

where the dd9 are PGA scores (Ichi et al., 5 Feb 2026, Rodríguez, 30 May 2025).

This workflow is the direct Riemannian generalization of linear PCA, automatically reducing to the classical case when gg0.

2. Riemannian Geometry Primitives

  • Geodesic Distance: gg1 is the length of the shortest path joining gg2, computed as the integral of the norm of the velocity vector along a geodesic curve.
  • Exponential Map: gg3 maps a tangent vector to a point on gg4 via geodesic flow.
  • Logarithm Map: gg5 is defined locally as the inverse of the exponential map.
  • Fréchet Mean Uniqueness: In negatively curved spaces and compact manifolds, the Fréchet mean is unique.
  • Parallel Transport: In more advanced variants (e.g., for subspace deflation), parallel transport moves vectors along geodesics without changing their norm or inner product, providing tangent-space consistency.

Closed-form expressions for gg6 and gg7 are available for canonical cases: spheres (gg8), SPD matrices under the affine-invariant (AIRM) metric, and Grassmannians (Ichi et al., 5 Feb 2026, Giovanis et al., 2024).

3. Algorithmic and Computational Aspects

PGA’s computational pipeline involves iterative steps:

Step Operations per iteration Typical Complexity
Fréchet mean gg9 log-maps, one exp-map update NN0
Log-mapping data NN1 log-maps NN2
Tangent covariance Forming NN3 outer products in NN4 NN5
Eigen-decomposition Tangent covariance eigendecomposition NN6

For the hypersphere NN7:

  • NN8
  • NN9

For SPD matrices:

  • piMp_i \in M0
  • piMp_i \in M1

The overall complexity is typically piMp_i \in M2, where piMp_i \in M3 is the mean-iteration count. The dominant costs arise in high dimensions and for expensive log/exp evaluations (Ichi et al., 5 Feb 2026).

4. Exact vs. Linearized PGA and Curvature Effects

Two broad classes of algorithms are prevalent:

  • Linearized (Tangent-Space) PGA: Applies PCA to piMp_i \in M4 in piMp_i \in M5 and maps principal directions back to piMp_i \in M6 via piMp_i \in M7. Accurate when data are concentrated within a convex normal neighborhood of piMp_i \in M8.
  • Exact PGA: Optimizes principal geodesic subspaces directly on piMp_i \in M9. For a 1D principal geodesic, searches for the geodesic through pˉ\bar{p}0 that minimizes total squared distance to all points, with each point orthogonally projected onto the candidate geodesic. Gradient and Hessian computation utilize Jacobi fields to account for manifold curvature (Sommer et al., 2010, Lazar et al., 2016, Chakraborty et al., 2016).

Curvature influences PGA outcomes:

  • In regions of high curvature or when data have large spread, linearized PGA deviates from "true" principal geodesics and may fail to capture dominant variation directions.
  • Taylor-expansion-based PGA analysis shows that for small-scale data, curvature minimally perturbs the principal direction; for large spread, PGA principal directions differ by up to pˉ\bar{p}1 in negative-curvature settings (Lazar et al., 2016).

Constant-curvature manifolds (e.g., pˉ\bar{p}2, hyperbolic spaces) admit closed-form expressions for principal geodesic projection, enabling efficient "exact" PGA (Chakraborty et al., 2016).

5. Generalizations and Extensions

(a) Riemannian PCA for Generic Local Metrics

The Riemannian PCA (R-PCA) paradigm extends PGA by first endowing data—without explicit manifold structure—with local Riemannian metrics derived from, e.g., a UMAP-induced local metric. This approach constructs a data-driven local geometry, enabling computation of Fréchet means and tangent-space covariances with locally defined distances, converging to population-level results as the sample increases (Rodríguez, 30 May 2025).

(b) Principal Geodesic Analysis in Non-classical Geometries

(c) Probabilistic and Mixture Models

Probabilistic PGA (PPGA) and its mixture extensions (MPPGA, MBPGA) embed the principal geodesic framework within a latent variable model, combining mixture clustering, automatic relevance determination of geodesic directions, and maximum-likelihood estimation via EM algorithms, enabling robust modeling of multi-modal manifold data (Zhang et al., 2019).

6. Application Domains and Limitations

Application domains:

  • Directional statistics: Orientation data on spheres and rotations.
  • Shape analysis: Landmark-based configurations, planar and 3D shapes.
  • Covariance descriptors: SPD matrix-valued features in computer vision.
  • Probability measures: Distributional data modeling (e.g., images as distributions).
  • Mechanical systems: Analysis of trajectories on pˉ\bar{p}5, pˉ\bar{p}6, Lie groups.
  • Topological summaries: Merge trees, persistence diagrams.
  • Climate time series: Path-signatures viewed as points on principal Lie group or associated manifolds (Sugiura, 2023).

Limitations:

  • Computational cost increases sharply with data spread, ambient dimension, or curvature.
  • Validity of tangent-space linearization is restricted to neighborhoods inside the injectivity radius.
  • PGA is less effective for data exhibiting large, intrinsically nonlinear modes unless using exact or domain-specialized algorithms.
  • For complex manifold structures or non-geodesic data clusters, non-geodesic or nested manifold learning methods may outperform standard PGA (Yang et al., 2020).

7. Comparative Performance and Theoretical Guarantees

Method Geometry Strengths Limitations
Tangent-PGA Any manifold Efficient, closed-form eigen-decomposition. Accurate for locally concentrated data. Fails for data with large spread/strong curvature.
Exact PGA Any manifold True geodesic residual minimization, curvature-aware. High computational complexity.
Constant-curvature PGA pˉ\bar{p}7, pˉ\bar{p}8 Closed-form projections, efficient. Only applicable on constant curvature spaces.
Wasserstein GPCA pˉ\bar{p}9 Matches mass-transport geometry, closed-form for Gaussians, scalable via entropic regularization. Computation of barycenters and projections is challenging for general measures.
Mixture/Bayesian PGA Any Captures multimodality, automatic dimensionality selection. Requires careful hyperparameter setting, EM convergence not guaranteed to global optimum.

Consistency results ensure that as pˉ=argminpMi=1NdM2(p,pi),\bar{p} = \arg\min_{p \in M} \sum_{i=1}^N d_M^2(p, p_i),0, sample Fréchet means and principal geodesics converge to their population counterparts under mild regularity and concentration conditions (Rodríguez, 30 May 2025, Bigot et al., 2013).

References

Principal Geodesic Analysis provides a unifying, geometry-respecting statistical framework for dimensionality reduction and exploratory analysis across manifold-valued data settings, combining rigorous mathematical foundations with methods tailored for a wide array of application domains.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Principal Geodesic Analysis (PGA).