Nyström Method for Low-Rank Approximation

Updated 23 February 2026

Nyström method is a numerical technique that approximates large positive semidefinite matrices by sampling a subset of columns and rows for scalable computation.
It utilizes strategies such as uniform, leverage-score, and clustered sampling, offering provable error bounds and improved performance in kernel methods and linear algebra tasks.
The method finds practical applications in machine learning, signal processing, and numerical PDEs, enabling efficient computations for high-dimensional and large-scale problems.

The Nyström method is a foundational numerical technique for the low-rank approximation of positive semidefinite matrices, kernel Gram matrices, and more general applications involving integral operators. At its core, the method constructs an approximate matrix using only a small collection of its columns and/or rows, enabling scalable computations in contexts where the full matrix is prohibitively large to store or diagonalize. The approach is widely used in machine learning, signal processing, randomized numerical linear algebra, computational physics, and numerical analysis for partial differential and integral equations.

1. Mathematical Foundations and Algorithmic Framework

Given a symmetric positive semidefinite (SPSD) matrix $K \in \mathbb{R}^{n \times n}$ , the classical Nyström method proceeds by sampling a small set $S$ of $m \ll n$ column (and matching row) indices. Let $C = K_{:,S}$ denote the $n \times m$ submatrix of sampled columns, and $W = K_{S,S}$ the $m \times m$ principal submatrix. The Nyström approximation is defined as: $\widetilde{K} = C\,W^{\dagger}\,C^T$ where $W^{\dagger}$ is the Moore–Penrose pseudoinverse. For most practical cases where $W$ is invertible, $S$ 0 may be used. This construction yields a positive semidefinite rank- $S$ 1 approximation to $S$ 2. For general matrices (possibly nonsymmetric or rectangular), the "generalized Nyström" or CUR method approximates $S$ 3 by selecting $S$ 4 rows $S$ 5 and $S$ 6 columns $S$ 7 and forming: $S$ 8 with $S$ 9, $m \ll n$ 0, $m \ll n$ 1.

In the operator setting, the Nyström technique replaces a continuous integral eigenproblem with a symmetric discretization via quadrature: $m \ll n$ 2 becomes, after quadrature and symmetrization,

$m \ll n$ 3

where $m \ll n$ 4 is a symmetric positive definite matrix derived from the covariance kernel and the quadrature weights (Corlay, 2010).

2. Landmark Selection and Accuracy Guarantees

The quality of the Nyström approximation crucially depends on the strategy used to select sampled columns ("landmarks"). Common approaches include:

Uniform random sampling: Simple and often effective for incoherent matrices.
Leverage-score or coherence-based sampling: Columns are sampled in proportion to their statistical leverage in the top eigenspace or their norm. This yields provably stronger bounds on approximation error, both for spectral- and Frobenius-norms, and is especially advantageous for matrices with high coherence or ill-conditioned spectra (Musco et al., 2016, Altschuler et al., 2018).
K-means or clustered selection: Particularly effective in kernel methods, where cluster centroids minimize the expected reconstruction error over the data manifold (Pourkamali-Anaraki et al., 2016).
Greedy or rank-revealing QR selection: Maximizing the volume or minimizing the condition number of $m \ll n$ 5 (e.g., using RRQR or DPP approaches) yields stable and accurate approximations (Bucci et al., 19 Nov 2025).

Theoretical error bounds depend on the matrix spectrum and its coherence. For rank- $m \ll n$ 6 SPSD matrices, Talwalkar and Rostamizadeh show that sampling $m \ll n$ 7 columns (where $m \ll n$ 8 is the coherence) suffices for exact recovery with high probability (Talwalkar et al., 2010). For general PSD matrices, sampling according to ridge leverage scores at a regularization parameter $m \ll n$ 9 achieves a spectral approximation guarantee: $C = K_{:,S}$ 0 with sample complexity $C = K_{:,S}$ 1, where $C = K_{:,S}$ 2 is the $C = K_{:,S}$ 3-effective dimension (Musco et al., 2016, Altschuler et al., 2018).

After forming $C = K_{:,S}$ 4 of rank $C = K_{:,S}$ 5, truncation to rank $C = K_{:,S}$ 6 is commonly performed. The classical strategy is

$C = K_{:,S}$ 7

where $C = K_{:,S}$ 8 is the best rank- $C = K_{:,S}$ 9 approximation to $n \times m$ 0. An optimal approach—strictly improving upon truncation at the $n \times m$ 1 level—is to perform a thin (economy-size) QR decomposition of $n \times m$ 2, transform the problem into the span of $n \times m$ 3, and then compute the best rank- $n \times m$ 4 approximation via SVD in this subspace (Pourkamali-Anaraki et al., 2017, Pourkamali-Anaraki et al., 2016). This yields lower error without additional leading-order cost.

Algorithmic refinements address cases where standard Nyström approximations saturate at modest accuracy. High-accuracy schemes such as the HAN framework employ iterative pivoting and progressive sampling (via Schur-complement and strong rank-revealing factorizations), achieving errors approaching machine precision with only modest increases in sampled columns (Xia, 2023).

Numerical stability is governed by the conditioning of $n \times m$ 5. For nearly low-rank matrices, naive pseudo-inversion can amplify roundoff by factors proportional to the condition number $n \times m$ 6. Stabilized Nyström variants—using truncated Cholesky, careful regularization, or strong column selection—admit precise backward-stability guarantees and are necessary to ensure robust performance in large-scale applications (Bucci et al., 19 Nov 2025).

4. Applications in Statistical and Numerical Computing

Machine Learning and Data Analysis

The Nyström method is integral to large-scale kernel machines, enabling low-rank approximations in kernel SVMs, kernel ridge regression, kernel PCA, and related learning algorithms. Nyström linearization allows conversion of non-linear kernel learning into efficient linear algebra by projecting inputs into a data-adaptive feature space derived from sampled kernel evaluations (Li, 2016, Hallgren, 2021). Advanced sampling (leverage scores, clustering) and modified rank-reduction strategies yield improved empirical and theoretical guarantees on approximation error and learning accuracy (Pourkamali-Anaraki et al., 2016, Pourkamali-Anaraki et al., 2017). Block-diagonal and boosting-based Nyström extensions further scale the approach to massive datasets, enabling efficient distributed/ensemble computation and improved tail-error control in heavy-tailed spectral regimes (Garg et al., 21 Jun 2025, Hamm et al., 2023).

Signal Processing and Functional Approximation

In high-dimensional signal processing, Nyström-based estimators compute low-rank projections of covariance matrices, used for principal component estimation and adaptive beamforming. Nyström covariances exhibit beneficial eigenvalue shrinkage, improving mean-squared error relative to classical sample covariance in the $n \times m$ 7 regime and leading to substantial computational savings (principal subspace extraction drops from $n \times m$ 8 to $n \times m$ 9). Empirical studies show effectiveness in array beamforming and patch-based image denoising (Arcolano et al., 2011).

In stochastic process quantization, the Nyström method is employed to discretize the Karhunen–Loève (KL) decomposition of Gaussian processes, enabling efficient simulation and variance reduction in Monte Carlo for path-dependent financial derivatives. In the context of fractional Brownian motion, careful quadrature and Richardson extrapolation yield eigenvalue/eigenfunction errors at $W = K_{S,S}$ 0 or better, and are essential to the accurate representation and quantization of infinite-dimensional Gaussian processes (Corlay, 2010).

Numerical PDEs and Integral Equations

In numerical analysis, the Nyström method underpins collocation-based schemes for discretizing integral operators and boundary integral equations. Spatially, Nyström discretization replaces continuous operators with high-accuracy quadrature applied to weakly singular integral kernels, forming the basis of high-order solvers for problems including Laplace, Helmholtz, and elasticity equations. Isogeometric variants leverage CAD/NURBS parametrizations with local kernel regularization strategies, achieving high-order or even spectral convergence for problems on complex geometries (Anand et al., 2017, Zechner et al., 2015). Parallel-in-time Sinc–Nyström discretizations, combined with block-diagonal low-rank preconditioners, permit efficient all-at-once solution of evolutionary PDEs by ensuring mesh-independent convergence and parallelism across time slices (Liu et al., 2021).

5. Theoretical Properties: Error, Coherence, and Complexity

Error properties of the Nyström approximation are analytically tractable. For exactly low-rank $W = K_{S,S}$ 1 with sufficiently many sampled columns, exact recovery is achieved. For general matrices, the error is governed by the interplay between spectral decay and subspace coherence. The coherence parameter $W = K_{S,S}$ 2,

$W = K_{S,S}$ 3

quantifies the "peakedness" of the leading eigenvectors. High coherence necessitates more samples or advanced sampling schemes to capture the dominant eigenspaces (Talwalkar et al., 2014).

Computational complexity is notably reduced: for matrices where the best rank- $W = K_{S,S}$ 4 approximant is desired, the total cost is $W = K_{S,S}$ 5 for classical Nyström, with near-linear scaling in $W = K_{S,S}$ 6. Advanced Block-Nyström schemes further reduce preprocessing to $W = K_{S,S}$ 7 while maintaining $W = K_{S,S}$ 8 relative error (Garg et al., 21 Jun 2025). For functional problems, the main cost lies in matrix assembly ( $W = K_{S,S}$ 9), with leading eigensolvers scaling as $m \times m$ 0 for $m \times m$ 1 eigenpairs.

6. Notable Advances and Empirical Findings

Empirical benchmarks reveal several consistent findings:

For sufficiently low-coherence kernels and rapid spectral decay, uniform Nyström with moderate $m \times m$ 2 closely matches the full SVD in approximation quality and downstream learning accuracy (Talwalkar et al., 2010, Pourkamali-Anaraki et al., 2016).
QR-based and randomized clustered variants offer order-of-magnitude improvements in the number of samples required, particularly for structured or high-dimensional data (Pourkamali-Anaraki et al., 2016).
Stability-optimized variants with truncated Cholesky or max-vol sampling ensure reliable performance at high target rank, even under extreme ill-conditioning (Bucci et al., 19 Nov 2025).
In both statistical and PDE contexts, advanced Nyström methods achieve high-precision approximations with cost and sample complexity essentially linear in data size, with empirical error curves tracking theoretical expectations (Arcolano et al., 2011, Xia, 2023, Corlay, 2010).

7. Outlook and Limitations

While the Nyström method offers computational and theoretical advantages for large-scale and high-dimensional problems, several limitations persist:

The quality of approximation is sensitive to the sampling strategy; naïve uniform sampling can fail in high-coherence or high-leverage contexts.
Numerical instability arises if the landmark intersection matrix $m \times m$ 3 is ill-conditioned, necessitating stabilization or truncation (Bucci et al., 19 Nov 2025).
Achieving high-precision for nonsymmetric or general rectangular operators requires iterative, pivoted, or progressive sampling rather than classical random subsets (Xia, 2023).
Extension to highly oscillatory (high-frequency) kernels, non-smooth domains, or extreme high dimensionality continues to motivate research on adaptive, geometry-aware, and hybrid approaches.

The Nyström method remains a central technique in computational mathematics, enabling scalable algorithms across integral equations, kernel methods, signal estimation, and randomized matrix computations. Ongoing developments focus on optimization of sampling, robust stabilization, parallel and block structures, and integration with advanced statistical estimators.