Structure-Preserving Dimensionality Reduction

Updated 3 December 2025

SPDR is a framework of dimensionality reduction that preserves essential structures like local geometry, global relationships, and topological features.
It employs diverse methodologies such as Riemannian geometric mappings, subspace-preserving projections, and topology-aware techniques to maintain intrinsic data properties.
SPDR algorithms balance computational efficiency with rigorous mathematical guarantees, making them ideal for applications ranging from brain-computer interfaces to high-dimensional clustering.

Structure-Preserving Dimensionality Reduction (SPDR) refers to a broad family of dimensionality reduction frameworks, algorithms, and mathematical principles whose central objective is to produce low-dimensional representations of high-dimensional data that rigorously preserve meaningful structure—such as local geometry, global relationships, subspace configuration, intrinsic topology, or system dynamics—present in the original domain. Preservation of such structure is essential in applications where distances, clusters, manifolds, subspace arrangements, or dynamical invariants carry direct interpretability, impact downstream tasks, or encode scientific knowledge. SPDR encompasses Riemannian manifold DR, union-of-subspaces methods, topology- and homology-preserving techniques, model reduction for dynamical systems, and density or correlation-preserving approaches.

1. Foundational Principles and Mathematical Formulations

SPDR techniques are formulated to preserve specific aspects of structure. The mathematical formulations differ by the structure to be preserved:

Local geometry: Many SPDR methods preserve local distances or geodesic structure between data points and their neighborhoods. On Riemannian manifolds (e.g., SPD matrices), this employs geodesic distances computed under metrics such as the Affine-Invariant Riemannian Metric (AIRM):

$d_R(X, Y) = \| \log(X^{-1/2} Y X^{-1/2}) \|_F$

Preserving distances to local means (Karcher means) ensures local curvature and relationships are maintained (Davoudi et al., 2016).

Global structure: Approaches such as metric MDS, heat geodesics, and correlation-based methods seek to preserve extensive interpoint relationships. For example, preserving Pearson and Spearman correlations between selected pairwise distances ensures global arrangement fidelity (Gildenblat et al., 10 Mar 2025).
Subspace or algebraic structure: When data are drawn from a union of independent subspaces, SPDR aims for a projection that preserves subspace independence, often using principal vector pairs and explicit theorems on minimal embedding dimension (e.g., $2K$ dimensions sufficing for $K$ classes) (Arpit et al., 2014).
Topological invariants: Homology- or topology-preserving SPDR targets the maintenance of persistent Betti numbers, loops, and connected components via combinatorial constructions such as persistent homology and Reeb graph skeletons (Yan et al., 2018).
Dynamical or system structure: Structure-preserving model reduction for marginally stable LTI systems maintains invariant subspaces associated with stability and the spectra of the original system, using Lyapunov and Hamiltonian structures (Peng et al., 2017).

2. Key Methodological Families in SPDR

There exists a diverse spectrum of SPDR algorithms, each matched to the structure of interest and the mathematical model of the data:

Riemannian Geometric Methods for SPD Matrices: Mapping symmetric positive definite matrices between manifolds of different dimensions while preserving geodesic or divergence-defined structure is a canonical SPDR task (Harandi et al., 2016, Davoudi et al., 2016). Methods such as Distance Preservation to Local Mean (DPLM) optimize orthonormal projections to preserve Riemannian or Jensen–Bregman Log-Det distances to Karcher means and use Stiefel or Grassmann manifold optimization.
Subspace-Preserving Linear Projections: For data modeled as lying in a union of independent subspaces, SPDR leverages the principal angles framework to construct projections that maintain subspace independence. The $2K$-vector theorem provides explicit target dimension and alternating least-squares procedure for projection construction (Arpit et al., 2014).
Structure-aware Graph and Diffusion Frameworks: Approaches such as Heat Geodesic Embeddings directly estimate manifold geodesics using heat kernels and optimize stress for multidimensional scaling that reflects diffusion-defined distances. These blend theoretical guarantees from Riemannian geometry (e.g., Varadhan’s formula) with computational efficiency (Huguet et al., 2023).
Topology/Homology-Preserving Embedding: Methods inspired by topological data analysis (TDA) use constructions such as the Vietoris–Rips complex, Mapper skeletons, and landmark selection to ensure that key homological features—loops, voids—are preserved in the embedding space. Pipeline components include k-NN graphs, DBSCAN cluster covers, and minimal manifold tearing guided by persistent homology (Yan et al., 2018).
Structure-Preserving Model Reduction for LTI Systems: For dynamical systems with both stable and marginally stable components, SPDR separates the system, applies inner-product projections (energy-based) to the stable block and symplectic projections to the Hamiltonian block, with rigorous preservation of stability and spectral structure (Peng et al., 2017).
Integrated Local/Global Structure Methods: Recently, algorithms such as DREAMS and PCC explicitly balance preservation of local relationships (e.g., KNN, t-SNE loss) with global arrangement (e.g., alignment to PCA, correlation loss) via composite objectives and tunable trade-offs (Gildenblat et al., 10 Mar 2025, Kury et al., 19 Aug 2025).

3. Algorithmic Procedures and Computational Guarantees

Each SPDR methodology is tailored both algorithmically and mathematically:

Manifold-structured data: Gradient or conjugate-gradient optimization with orthogonality constraints on Stiefel or Grassmann manifolds is employed for learning projection matrices $U$ or $W$ (e.g., DPLM, AIRM-based projections). Fast eigen-problem solvers are deployed when using log-Euclidean metrics (Davoudi et al., 2016, Harandi et al., 2016).
Alternating minimization and coordinate descent: Some SPDR variants interleave projection learning with adaptive graph structure updating (e.g., probabilistic structure learning with Laplacian penalties) or alternate between base variables and structure variables (adaptive neighbor graphs, label constraints) (Wang, 2016, Shi et al., 2020).
Out-of-sample extensions: Many techniques ensure the learned projection or mapping is explicit or efficiently computable for unseen samples, which is essential for generalization and deployment (Davoudi et al., 2016, Arpit et al., 2014).
Computational scaling: Recent methods achieve computational complexity linear or near-linear in the data size when $K$ (neighborhood size) and $m$ (embedding dimension) are small, using approximations such as landmark-based approaches, sparse matrix routines, or batch-style stochastic gradient descent (Davoudi et al., 2016, Huguet et al., 2023).

Mathematical guarantees established in these works include: (a) strict preservation of subspace independence under projection (with given embeddings), (b) bounded perturbation of class separability or Fisher eigenvalues after preprocessing, (c) preservation of homology under landmarking and tearing for generic point clouds, (d) stability and marginal-stability preservation in reduced dynamical systems, and (e) provable alignment between local or global distances in ambient and embedding spaces.

4. Empirical Performance and Application Domains

SPDR has demonstrated broad utility in domains where intrinsic structure is central:

Brain–Computer Interface (BCI): DPLM combined with geometric classifiers attains state-of-the-art performance (Cohen's κ ≈ 0.60) on challenging BCI datasets, outperforming previous geometry-aware DR methods; statistical significance is verified with, e.g., Wilcoxon signed-rank tests (Davoudi et al., 2016).
Visual Recognition and Motion Analysis: Geometry-aware SPD DR increases nearest-neighbor classification and kernel clustering accuracy on datasets such as UIUC materials textures, HDM05 mocap, YouTube Celeb, and Keck gestures, especially when using discriminative or maximum-variance objectives (Harandi et al., 2016).
High-dimensional Clustering and Classification: Subspace-preserving linear projections achieve superior accuracy in face (Yale B, AR, PIE) and synthetic subspace classification tasks, with improvements in sparse representation-based classification over PCA, LDA, and random projections (Arpit et al., 2014).
Single-Cell Genomics and Complex Systems: Correlation- and cluster-preserving DR (PCC) achieves the highest global structure preservation (average GS score 0.83) while maintaining competitive local cluster precision in bioimaging and transcriptomics compared to alternative methods (Gildenblat et al., 10 Mar 2025).
Topological Data Analysis: Homology-preserving DR recovers all major 1D-homology classes (persistent loops/holes) in point clouds from complex manifolds, outperforming vanilla Isomap and random landmarking, with favorable residual-variance and Wasserstein-homology metrics (Yan et al., 2018).
Scientific Computing and Model Reduction: Structure-preserving model reduction methods yield reduced-order LTI representations that exhibit minimal energy drift and robust preservation of system-theoretic stability (Peng et al., 2017).

5. Theoretical Limitations and Open Directions

Despite extensive theoretical and empirical validation, SPDR methods face domain-specific and general limitations:

Domain Assumptions: The performance of subspace-based SPDR is contingent on the presence of independent subspace structure; severe nonlinear overlap degrades guarantees (Arpit et al., 2014). Riemannian approaches presuppose correct manifold modeling of data.
Parameter Selection: Hyperparameters such as number of nearest neighbors, embedding dimension, diffusion time in heat-based methods, and weights balancing local/global terms typically require cross-validation or data-driven heuristics (Davoudi et al., 2016, Huguet et al., 2023, Gildenblat et al., 10 Mar 2025).
Scalability: Full pairwise computations (e.g., metric MDS, persistent homology) incur $O(N^2)$ memory or runtime, which necessitates sparse/approximate variants or batch optimization for large datasets (Huguet et al., 2023).
Topology-cell Cohomology Integration: Some TDA-inspired embeddings preserve first homology (loops), but higher Betti numbers or fine-scale topological features may not be preserved without further enhancements (Yan et al., 2018).
Extension to Nonlinear or Deep Kernel Regimes: While kernel and nonlinear extensions are plausible, explicit theoretical guarantees often become difficult to establish or computationally intensive (Arpit et al., 2014).

6. Comparative Context, Synthesis, and Recent Developments

SPDR unifies a range of DR techniques from geometric, probabilistic, algebraic, and topological perspectives. Strong empirical and theoretical evidence supports the superiority of SPDR methods for tasks where preservation of specific structure is critical, particularly when compared to generic DR such as PCA or random projections. State-of-the-art techniques now offer integrated objectives that interpolate between local and global preservation (e.g., DREAMS, PCC), enabling practitioners to tune embeddings for the requirements of downstream analysis (Kury et al., 19 Aug 2025, Gildenblat et al., 10 Mar 2025).

A comparative summary of canonical SPDR categories and their preserved structures:

SPDR Method	Preserved Structure	Principle
DPLM/SPD Manifold	Local Riemannian geometry	Distance to local mean
Subspace SPDR	Union of independent subspaces	Principal vectors, $2K$ theorem
HeatGeo	Manifold geodesics	Heat kernel, Varadhan
Homology Landmark	Persistent topology (homology)	Mapper skeleton, Isomap
Correlation/Cluster (PCC)	Pearson/Spearman, clusters	Correlation+classification
Model-Red. SPDR	System-theoretic invariants	Inner-product/symplectic

The field continues to extend SPDR paradigms to deeper architectures, unsupervised discovery of structure, and efficient, scalable frameworks for massive data sets—maintaining mathematical rigor and empirical fidelity to the structures that constitute meaningful scientific, industrial, or societal knowledge.