A Geometric Analysis of PCA (2510.20978v1)
Abstract: What property of the data distribution determines the excess risk of principal component analysis? In this paper, we provide a precise answer to this question. We establish a central limit theorem for the error of the principal subspace estimated by PCA, and derive the asymptotic distribution of its excess risk under the reconstruction loss. We obtain a non-asymptotic upper bound on the excess risk of PCA that recovers, in the large sample limit, our asymptotic characterization. Underlying our contributions is the following result: we prove that the negative block Rayleigh quotient, defined on the Grassmannian, is generalized self-concordant along geodesics emanating from its minimizer of maximum rotation less than $\pi/4$.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Explain it Like I'm 14
Overview
This paper studies how well Principal Component Analysis (PCA) works. PCA is a common tool that turns high‑dimensional data (lots of numbers per example) into fewer numbers while keeping as much important information as possible. The authors ask a focused question: what feature of the data determines PCA’s “extra error” — the amount by which PCA’s reconstruction is worse than the best possible reconstruction?
They give a precise, math‑based answer, and they do it by looking at PCA through the lens of geometry.
Key Objectives
The paper sets out to do three things, all explained in simple terms:
- Figure out exactly which parts of the data distribution control PCA’s extra error under the usual “reconstruction loss” (how far the reconstructed data is from the original).
- Describe what happens to PCA’s error when you collect more and more data — in the long run, does the error follow a predictable pattern?
- Give a practical, finite‑sample guarantee: for a given number of data points, how big can the extra error be with high probability?
Methods and Approach (with everyday analogies)
The authors use a geometric view of PCA:
- Think of all possible k‑dimensional directions (like planes through the origin if k=2) inside a d‑dimensional space. The set of these directions is called the Grassmann manifold. You can imagine it like a curved “map” of all possible subspaces.
- Distances on this map are defined by “principal angles” — how much you need to rotate one subspace to align it with another. This is like measuring how far you need to tilt a plane to match another plane.
- A “geodesic” is the shortest path on this curved map, similar to the shortest path on the surface of a sphere (like the Earth). Following a geodesic here means smoothly rotating one subspace toward another at a constant speed.
- PCA picks the subspace that minimizes reconstruction loss. That loss, for this geometric setup, can be written as a version of the “block Rayleigh quotient,” which is a formula that favors directions where the data varies the most.
Two technical ideas make the analysis work:
- Asymptotic statistics: As you get more data, the error behaves more predictably. The authors prove a central limit theorem on this curved space: after scaling, the error looks like a normal (Gaussian) random variable. Translation: the tiny wobbles of PCA’s chosen direction settle into a bell‑curve pattern when you have lots of data.
- Generalized self‑concordance: Along geodesics that start at the true best subspace and don’t rotate too far (less than 45 degrees), the loss function behaves nicely. This “niceness” means a second‑order Taylor expansion (using slopes and curvatures) gives a tight approximation. Analogy: near the bottom of a bowl‑shaped valley, the terrain is smooth enough that measuring slope and curvature tells you almost everything you need.
For finite samples, the authors combine:
- A global step, using standard stability results (you can think of them as saying, “if the data’s main directions are clearly separated, PCA won’t stray too far”).
- A local step, using their self‑concordance result to tightly control the error near the best subspace.
A key, natural condition throughout is having a gap between how strong the k‑th and the (k+1)‑th directions are. This “eigengap” is the difference in the data’s variance along those directions. If this gap is positive, the top k directions are clearly better than the rest.
Main Findings
Here’s what the authors discover and why it matters:
- Consistency: With enough data, PCA’s chosen subspace gets as close as you like to the true best subspace. In practice, this means PCA learns the right directions if you have sufficient samples.
- Asymptotic normality: The “direction error” (how much PCA’s subspace tilts away from the true one) shrinks like 1/√n (n = number of samples), and the pattern of this shrinkage is Gaussian. The exact spread depends on:
- How the data projects onto the top k directions versus the remaining d−k directions.
- The eigengaps (how much more variance the top directions have compared to the next ones).
- Excess risk behaves like 1/n: The extra reconstruction error decreases roughly proportional to 1/n. More precisely, if you multiply the extra error by n, it approaches the squared size of a Gaussian term. This tells you not just the average behavior but the entire distribution in the large‑sample limit.
- Matching finite‑sample bound: They prove a high‑probability upper bound for the extra error that mirrors the asymptotic form (up to constants) once you have enough samples. This bound depends on:
- Fourth moments of the data (a measure of how heavy‑tailed or extreme the data can be).
- The eigengap.
- Your chosen failure probability δ (how often you allow the bound to fail).
- Importantly, heavy‑tailed data makes the bound looser — a reminder that standard PCA is sensitive to outliers.
- Self‑concordance of the loss: The block Rayleigh quotient (the core loss behind PCA) is “generalized self‑concordant” along geodesics from the best subspace, provided you don’t rotate more than 45 degrees. In plain terms, the loss is well‑behaved near the optimum, so second‑order approximations are trustworthy there. This property underpins the tight local analysis.
- Special cases and extensions:
- In the “spiked covariance” model (signal in k directions plus isotropic noise), the formulas simplify and you can see clean dependence on noise level and signal strength.
- The approach extends beyond PCA to estimating leading eigenspaces of general symmetric matrices (like graph adjacency matrices), covering problems such as spectral clustering and community detection.
Implications and Impact
- Pinpointing the data property that drives PCA’s extra error: The key drivers are how strongly the data mixes top and bottom directions (through certain covariance terms) and how large the eigengaps are. Bigger gaps and less mixing mean less extra error.
- Predicting sample sizes: The results say when the asymptotic behavior kicks in and how many samples you need for the finite‑sample bound to be tight. This helps practitioners plan data collection.
- Understanding limits: If your data has heavy tails or small eigengaps, PCA can struggle, and the guarantees get weaker. That highlights when robust methods (that handle outliers) may be needed.
- Geometric tools for learning: Viewing PCA on the Grassmann manifold and using geodesic‑based analysis could inspire similar analyses for other algorithms that optimize over subspaces or other curved spaces.
In short, the paper gives a clear, geometric explanation of PCA’s performance, nails down the exact data features that control its extra error, and provides both long‑run and practical finite‑sample guarantees.
Knowledge Gaps
Knowledge gaps, limitations, and open questions
Below is a concise, actionable list of what remains uncertain or unexplored based on the paper’s results.
- Relaxing the eigengap assumption: develop asymptotic and non-asymptotic theory for PCA when λk = λk+1 (non-unique minimizers on a submanifold), including limit distributions for set-valued estimators and inference with flat directions on the Grassmann manifold.
- Berry–Esseen/Edgeworth refinements: quantify the finite-sample accuracy of the asymptotic normality (and of the excess-risk limit law), providing explicit rates and higher-order corrections under mild moment conditions.
- Weaker moment conditions: replace the fourth-moment-type assumptions (finiteness of Λijst and coordinate fourth moments) with weaker conditions (e.g., Lindeberg-type, finite (2+ε)-moments), and characterize the minimal assumptions for the CLT and risk distribution to hold.
- Heavy-tailed robustness: design and analyze robust PCA estimators (e.g., via robust covariance estimators, truncation, or median-of-means on manifolds) that attain excess-risk bounds with log(1/δ)-type tails under heavy-tailed data, and compare their constants and sample complexity to ERM/PCA.
- High-probability dependence on δ: improve the 1/δ dependence in the non-asymptotic bound to log(1/δ) under sub-Gaussian or sub-exponential tails, and identify necessary conditions under which this improvement is impossible.
- Tightening the global step: replace the Davis–Kahan-based global control with sharper, geometry-aware arguments to reduce the dominant third term in the sample complexity, and determine the exact “critical radius” ensuring entry into the π/4 neighborhood.
- Global landscape analysis: characterize the nonconvex landscape of the reconstruction risk more precisely (number/type of critical points, attraction basins, strict-saddle structure), to enable sharper global-to-local sample complexity and algorithmic guarantees.
- Extending self-concordance: generalize the geodesic generalized self-concordance beyond geodesics emanating from the minimizer and beyond the π/4 angle restriction; identify the largest region of geodesic “near-convexity” and the optimal constants.
- Self-concordance without eigengap: determine whether analogous generalized self-concordance inequalities hold when the minimizer set is non-unique (flat manifold of solutions), and how they can be exploited for inference.
- Alternative loss metrics: extend asymptotic and non-asymptotic characterizations to other natural losses (e.g., projection-Frobenius, chordal distance, principal-angle functionals), including matching quantile bounds.
- Minimax lower bounds: establish non-asymptotic and asymptotic lower bounds for the excess risk (and projection error) to certify the optimality of the proposed upper bounds beyond the Gaussian example.
- Quantile-level matching: sharpen constants in the non-asymptotic quantile upper bounds to match the asymptotic quantiles more tightly, and identify distribution classes where the constants are optimal.
- Interpretable variance proxies: derive tractable upper/lower bounds for the variance parameters 𝒱 and ν (Remark 5.1) in broad distribution families (e.g., sub-Gaussian, elliptical, bounded kurtosis), and paper their sensitivity to spectral decay and kurtosis.
- Generalized PCA (random symmetric matrices): extend the finite-sample bound (Theorem 5.1) to the general A-setting (Section 4.1) with explicit variance parameters and sample complexity in terms of moments of A and eigengaps of M.
- Dependent data: extend both the CLT and finite-sample analysis to time series and other dependent settings (e.g., mixing processes), including concentration for sample covariance under dependence.
- High-dimensional asymptotics: analyze regimes where d and n grow jointly (e.g., d/n → γ, k possibly growing), and connect to random matrix theory and spiked models with phase transitions (BBP), assessing how the excess-risk characterization changes.
- Model selection for k: quantify the impact of data-driven selection of k on the excess risk and its distribution, and develop joint procedures with provable guarantees on both subspace and k.
- Mean estimation effects: incorporate the estimation of the mean (centering step) into the asymptotic and non-asymptotic analyses under weak moments and dependence, and isolate additional error terms.
- Algorithmic implications: exploit generalized self-concordance to design and analyze Riemannian Newton/trust-region methods with fast local rates for PCA/generalized PCA, and compare their statistical–computational tradeoffs to ERM and Riemannian SGD.
- Averaged Riemannian SGD: rigorously derive the asymptotic covariance for averaged Riemannian SGD in PCA without unverified assumptions, and provide finite-sample risk quantiles akin to Theorem 5.1.
- Infinite-dimensional settings: extend the framework to kernel PCA and functional PCA (in Hilbert spaces), establishing CLTs and non-asymptotic bounds with appropriate eigengap-type conditions and compactness/regularization.
- Complex-valued and SVD settings: generalize the theory to complex Grassmannians and to singular-vector problems (e.g., CCA/SVD), including self-concordance-type results on Stiefel manifolds.
- Small eigengaps: refine the dependence on 1/(λj − λk+i) to determine exact thresholds where PCA becomes statistically unstable, and explore adaptive procedures that remain reliable with closely spaced eigenvalues.
- Empirical validation: conduct systematic experiments across distributions (Gaussian, elliptical, heavy-tailed) to assess finite-sample normality, quantile predictions, and tightness of the sample complexity terms and constants.
- Confidence sets on manifolds: use the asymptotic normality in the tangent space to construct valid confidence sets for the principal subspace (via the exponential map) and evaluate their finite-sample coverage.
- Beyond reconstruction risk: investigate whether similar asymptotic and non-asymptotic characterizations hold for alternative PCA objectives (e.g., maximizing explained variance under constraints, sparse PCA surrogates), and how manifold geometry interacts with regularization.
Practical Applications
Immediate Applications
Below are practical, deployable-now uses of the paper’s findings, organized by sector, with notes on workflow implications and assumptions.
- Risk-aware PCA planning and certification (Software, Data Science, Healthcare, Finance)
- Application: Compute high-probability upper bounds and asymptotic quantiles for PCA reconstruction error to certify dimensionality reduction quality in pipelines (e.g., model cards, audit reports).
- Workflow/tool: Add a “PCA Risk Estimator” to existing libraries (e.g., scikit-learn, PyTorch) that:
- Estimates eigengaps and fourth-moment terms (Λ) from data.
- Outputs the 1−δ quantile of excess risk and a recommended sample size n using Theorem 5 and Corollary 1.
- Flags instability when the eigengap is small or the data are heavy-tailed.
- Assumptions/dependencies: Positive eigengap (λk > λk+1); finite fourth moments; accuracy of plug-in estimators for Λ; heavier tails degrade guarantees (δ dependence becomes worse).
- Sample-size calculators for PCA deployments (Software, Healthcare, Finance, Manufacturing)
- Application: Plan data collection to meet reconstruction-error targets with high probability (e.g., for clinical imaging PCA denoising, portfolio factor stability).
- Workflow/tool: A “PCA Sample Size Planner” that takes preliminary data, desired error threshold ε, and failure probability δ, and returns minimal n satisfying the non-asymptotic bound.
- Assumptions/dependencies: Requires estimates of variance parameters (𝓥, ν), spectrum of Σ, and operator-norm moment terms (𝓢, r(n)); Gaussian simplifications available (Example 2).
- Principled selection of k (dimension) using risk curves (All sectors using PCA)
- Application: Choose k by minimizing the derived excess-risk expression for candidate k values, rather than relying solely on explained variance.
- Workflow/tool: “Risk-vs-Compression” curves that plot asymptotic and finite-sample bounds against k, aiding stakeholders to trade off error and dimensionality.
- Assumptions/dependencies: Positive eigengap for evaluated k; stable estimation of Λ and eigenvalues.
- Subspace drift monitoring in streaming systems (Manufacturing, Energy, IT Operations)
- Application: Detect changes in system behavior by monitoring geodesic distances between current PCA subspace and baseline; use CLT-based thresholds for alerts.
- Workflow/tool: Control charts on the Grassmannian using principal angles and asymptotic normality of subspace error; alert when distances exceed calibrated bounds.
- Assumptions/dependencies: Stationary periods for calibration; finite moments; changes in covariance structure drive true drift.
- Spectral methods quality control for graphs (Software, Social Networks, Telecom)
- Application: Quantify and plan sample sizes for spectral clustering/community detection using adjacency/Laplacian eigenspaces (Remark on generalized PCA).
- Workflow/tool: For edge-sampling pipelines, compute required n to guarantee quality of spectral embeddings; provide risk bounds analogous to PCA reconstruction.
- Assumptions/dependencies: Symmetric matrix estimation (e.g., adjacency, Laplacian); analogous moment conditions for matrix entries (Λ generalized); eigengap in target eigenspace.
- Heavy-tail robustness gating (Finance, Cybersecurity, Retail)
- Application: Automatically warn and switch to robust alternatives when heavy tails may invalidate ERM-like PCA guarantees.
- Workflow/tool: A diagnostic that checks δ scaling, tail behavior, and eigengaps; if poor, recommend robust PCA (e.g., median-of-means covariance) or trimmed data strategies.
- Assumptions/dependencies: Reliable tail diagnostics; availability of robust estimators; performance trade-offs accepted.
- Algorithm tuning and validation on Grassmann manifolds (Software, Robotics)
- Application: Use geodesic generalized self-concordance to inform step sizes and local curvature in Riemannian gradient/trust-region methods for subspace estimation.
- Workflow/tool: Add curvature-aware line search rules and local Taylor-model checks to manifold optimization routines; validate convergence using the provided bounds.
- Assumptions/dependencies: Operations constrained to neighborhoods within π/4 principal angle to minimizer; accurate computation of principal angles and exponential/log maps.
Long-Term Applications
These uses will benefit from further research, scaling, or development before widespread deployment.
- Riemannian self-concordant optimization methods (Software, Robotics, Control)
- Application: Design new manifold-optimization algorithms (e.g., Newton-like, trust-region) with global convergence guarantees leveraging geodesic self-concordance.
- Potential products/tools: “Riemannian Self-Concordant Optimizer” for subspace problems across PCA, CCA, and related tasks.
- Dependencies: Stronger globalization strategies (beyond local π/4 neighborhoods), robust curvature estimation, theoretical guarantees for noisy objectives.
- Robust PCA with heavy-tail guarantees (Finance, Security Analytics)
- Application: Integrate robust estimators (median-of-means, shrinkage) into the analysis to recover favorable δ dependence and tail-insensitive bounds.
- Potential products/tools: “Robust PCA Risk Certificates” and sample-size calculators customized for tail-heavy domains.
- Dependencies: New theory extending bounds to robust covariance estimators; performance/speed trade-offs.
- Kernel and functional PCA extensions (Healthcare imaging, NLP, Earth Observation)
- Application: Extend geometric and statistical analysis to infinite-dimensional settings (RKHS, functional data), enabling risk-aware nonlinear dimension reduction.
- Potential products/tools: “Risk-Aware Kernel PCA” modules for large-scale feature maps and functional data streams.
- Dependencies: Infinite-dimensional Grassmannian theory; computational feasibility for large kernels; practical approximations (Nyström, random features).
- Risk-aware AutoML for dimensionality reduction (Software, Enterprise AI)
- Application: Automatic selection of PCA/graph spectral methods and k using bound-driven criteria; integrate into pipelines for compliance and stability.
- Potential products/tools: AutoML components that target specified reconstruction-error quantiles and certify outputs.
- Dependencies: Robust plug-in estimators of Λ and eigengaps; reliable calibration across datasets.
- Privacy-preserving PCA planning (Policy, Privacy Tech, Healthcare)
- Application: Calibrate sample sizes and DP noise to meet risk targets while protecting data; publish uncertainty certificates compliant with privacy constraints.
- Potential products/tools: “DP-PCA Planner” that balances reconstruction risk and privacy budget (ε, δ) using the paper’s bounds.
- Dependencies: DP mechanisms for fourth-moment and covariance estimation; analysis of bound degradation under DP noise.
- Online/streaming PCA with change detection (IoT, Finance, Ops)
- Application: Combine asymptotic normality with sequential testing to detect regime shifts in real time; maintain risk certificates under rolling windows.
- Potential products/tools: Streaming subspace trackers with CLT-based alarms and dynamic sample-size management.
- Dependencies: Theory for non-stationary covariances; adaptive estimators of Λ; computationally efficient principal-angle updates.
- Contrastive and representation learning via eigenspace estimation (Software, ML Research)
- Application: Use generalized block Rayleigh-quotient analysis to plan sample sizes and error bounds in contrastive objectives and spectral pretraining.
- Potential products/tools: “Spectral Representation Planner” for contrastive pipelines where leading/trailing eigenspaces drive representations.
- Dependencies: Extensions of Λ to task-specific matrices; empirical validation on large-scale datasets.
- Open-source “PCA Assurance” toolkit (Software, Academia)
- Application: Package routines to compute principal angles, geodesics, asymptotic/finite-sample risk bounds, and sample-size plans across PCA and generalized eigenspace tasks.
- Dependencies: Efficient numerical routines for Grassmann geometry; robust statistical estimators; documentation and benchmarks across domains.
Glossary
- Adjacency matrix: A square matrix representing weighted connections between nodes in a graph. "consider the case where M is the adjacency matrix of an undirected weighted graph with non-negative weights."
- Asymptotic distribution: The limiting probability distribution of a statistic as the sample size grows. "derive the asymptotic distribution of its excess risk under the reconstruction loss."
- Asymptotic normality: The property that a properly scaled estimator converges in distribution to a normal distribution as sample size increases. "who established its asymptotic normality."
- Block Rayleigh quotient: The generalization of the Rayleigh quotient to subspaces, measuring energy captured by a k-dimensional subspace. "the negative block Rayleigh quotient, defined on the Grassmannian, is generalized self-concordant along geodesics emanating from its minimizer of maximum rotation less than ."
- Central limit theorem: A fundamental result stating that sums of i.i.d. variables (under conditions) converge to a normal distribution. "We establish a central limit theorem for the error of the principal subspace estimated by PCA"
- Community detection: Methods for identifying groups of densely connected nodes in graphs. "As examples of potential applications, we mention spectral clustering \cite[e.g.] []{ng2001spectral}, community detection \cite[e.g.] []{abbe2018community}, and contrastive learning \cite[e.g.] []{haochen2021provable}."
- Contrastive learning: Representation learning by contrasting positive and negative pairs. "As examples of potential applications, we mention spectral clustering \cite[e.g.] []{ng2001spectral}, community detection \cite[e.g.] []{abbe2018community}, and contrastive learning \cite[e.g.] []{haochen2021provable}."
- Covariance operator: A linear operator mapping matrices to their covariance-weighted expectation, generalizing covariance to random matrices. "we define its covariance operator to be the linear map $\Cov(W): R^{d \times k} \to R^{d \times k}$ given by $\Cov(W)[A] \Exp\brack{\inp{W}{A}_{F}W}$."
- Davis–Kahan theorem: A matrix perturbation bound relating changes in eigenspaces to matrix perturbations. "Matrix perturbation bounds \cite{stewart1990matrix}, most famously the Davis-Kahan theorem \cite{davis1970rotation,yu2015useful}"
- Direct sum: An operation combining subspaces so each vector decomposes uniquely into components from each subspace. "where is the direct sum of subspaces"
- Eigengap: The gap between consecutive eigenvalues, often controlling stability of eigenspaces. "Under the eigengap condition, the excess risk bound in Theorem \ref{thm:asymptotics} extends the result of \citet[] [Proposition 2.14]{reiss2020nonasymptotic}."
- Empirical risk minimization (ERM): A learning paradigm that minimizes average loss on data to estimate parameters. "PCA is viewed as an instance of empirical risk minimization"
- Excess risk: The difference between the risk of an estimator and the optimal (population) risk. "Our main interest is in the excess risk, as it directly measures how well PCA performs on the reconstruction task."
- Exponential map: A map sending a tangent vector to the endpoint of its geodesic at unit time on a manifold. "The exponential map at in the direction is then defined by $\Expo_{[U]}(\xi) = \gamma(1)$."
- Frobenius inner product: The inner product on matrices given by element-wise products summed; equals trace of AT B. "apply the Frobenius inner product"
- Frobenius norm: The matrix norm equal to the square root of the sum of squares of entries; the Euclidean norm of singular values. "the asymptotic distribution of the excess risk as the squared Frobenius norm of a Gaussian matrix."
- Gaussian concentration: Tail bounds and concentration inequalities specific to Gaussian distributions. "a simple consequence of Gaussian concentration"
- Geodesic: A curve of zero acceleration on a manifold; the shortest path locally. "The geodesic starting at in the direction is the curve $\gamma: [0, 1] \to \Gr(d, k)$"
- Grassmann manifold (Grassmannian): The manifold of k-dimensional subspaces of a d-dimensional space. "The space of equivalence classes under this relation is known as the Grassmann manifold $\Gr(d, k) \St(d, k)/\sim$."
- Isotropic noise: Noise with equal variance in all directions; covariance proportional to the identity. "corrupted with isotropic noise $"</li> <li><strong>Laplacian matrix</strong>: A matrix representation of a graph encoding connectivity; used in spectral methods. "A similar argument can be made for the estimation of the trailing $k$-dimensional eigenspace of the Laplacian matrix."</li> <li><strong>Logarithmic map</strong>: The local inverse of the exponential map, mapping a point to the tangent vector generating the geodesic. "Where well-defined, the logarithmic map at $[U][V]$ is the inverse of the exponential map"</li> <li><strong>M-estimator</strong>: An estimator defined as the minimizer of a sample average of a loss function. "we view PCA as an M-estimator"</li> <li><strong>Matrix Bernstein inequality</strong>: A concentration inequality bounding deviations of sums of random matrices. "They are obtained from the matrix Bernstein inequality"</li> <li><strong>Non-asymptotic</strong>: Pertaining to finite-sample analysis, not relying on limits as sample size grows. "We obtain a non-asymptotic upper bound on the excess risk of PCA"</li> <li><strong>Noncommutative Khintchine inequality</strong>: An inequality bounding norms of random matrices via Rademacher/Gaussian series. "using the noncommutative Khintchine inequality \cite{tropp2015introduction,van2017structured}"</li> <li><strong>Orthogonal projector</strong>: A matrix that projects vectors onto a subspace along its orthogonal complement; idempotent and symmetric. "PCA finds an orthogonal projector $UU^T \in R^{d \times d}k$-dimensional subspace"</li> <li><strong>Polyak–Łojasiewicz inequality</strong>: A condition linking function suboptimality to gradient norm, implying linear convergence. "those of \cite{zhang2016riemannian} who showed that $F$ satisfies a version of the Polyak--\L{}ojasiewicz inequality"</li> <li><strong>Principal angles</strong>: Angles characterizing the orientation difference between two subspaces. "The $j\theta_{j}([U], [V]) \in [0, \pi/2]\cos(\theta_j([U], [V])) = s_j$"</li> <li><strong>Principal subspace</strong>: The subspace spanned by the top k eigenvectors of a covariance (or symmetric) matrix. "We establish a central limit theorem for the error of the principal subspace estimated by PCA"</li> <li><strong>Quantile</strong>: The value below which a given proportion of data falls; the inverse CDF at a probability level. "its $1-\delta\delta \in [0, 1]$, is defined by"</li> <li><strong>Rayleigh quotient</strong>: A scalar measuring energy of a vector with respect to a symmetric matrix; maximized by eigenvectors. "Recall that eigenvectors corresponding to the largest eigenvalue are maximizers of the Rayleigh quotient."</li> <li><strong>Reconstruction loss</strong>: The squared error between data and its projection onto a subspace; the PCA objective. "derive the asymptotic distribution of its excess risk under the reconstruction loss."</li> <li><strong>Reconstruction risk</strong>: The expected reconstruction loss (population version) for a subspace. "we prove that the reconstruction risk is generalized self-concordant"</li> <li><strong>Riemannian distance</strong>: The intrinsic distance on a manifold given by the length of shortest geodesics. "the principal angles give us an explicit expression for the Riemannian distance between $[U][V]$"</li> <li><strong>Riemannian manifold</strong>: A smooth manifold equipped with an inner product on tangent spaces varying smoothly with position. "Our analysis takes place on the Grassmannian $\Gr(d, k)$, which admits the structure of a Riemannian manifold."</li> <li><strong>Riemannian metric</strong>: The smoothly varying inner product on tangent spaces defining geometry on a manifold. "The tangent space is equipped with an inner product $\inp{\cdot}{\cdot}_{[U]}[U]$."</li> <li><strong>Riemannian SGD</strong>: Stochastic gradient descent generalized to Riemannian manifolds. "a similar expression for the asymptotic variance of averaged Riemannian SGD on PCA"</li> <li><strong>Schatten‑p norm</strong>: The p-norm of singular values of a matrix; generalizes Frobenius (p=2) and operator (p=∞) norms. "the Schatten-$pG$"</li> <li><strong>Self-concordance (generalized self-concordance)</strong>: A curvature property bounding third derivatives by second derivatives, aiding robust Taylor approximations. "is generalized self-concordant along geodesics emanating from its minimizer"</li> <li><strong>Singular value decomposition (SVD)</strong>: Factorization of a matrix into orthonormal factors and singular values. "$\gamma(t)lift_{U}(\xi) = PSQ^T$"</li> <li><strong>Spiked covariance model</strong>: A model where covariance is low-rank plus isotropic noise; used in PCA analysis. "we consider the spiked covariance model \cite{johnstone2001distribution, nadler2008finite}"</li> <li><strong>Tangent space</strong>: The vector space of velocities of curves through a point on a manifold; linearization of the manifold at the point. "The tangent space of $\Gr(d, k)[U]T_{[U]}\Gr(d, k)k(d-k)$."
- Uniform convergence analysis: A theoretical framework bounding worst-case deviations between empirical and population losses across hypothesis classes. "variants of the uniform convergence analysis apply."
Collections
Sign up for free to add this paper to one or more collections.