Papers
Topics
Authors
Recent
Search
2000 character limit reached

Randomized Nyström Preconditioning

Updated 2 April 2026
  • Randomized Nyström preconditioning is a technique that uses randomized low-rank approximations to construct efficient preconditioners for large SPSD linear systems.
  • It enhances convergence rates by reducing the effective condition number through innovations like block and recursive extensions.
  • Adaptive rank selection and memory-efficient implementations make it practical for applications in kernel learning, PDE-constrained optimization, and inverse problems.

Randomized Nyström preconditioning is a class of algorithms that leverage randomized low-rank matrix approximations—specifically the Nyström method—to accelerate iterative solvers for large, symmetric positive (semi)definite (SPSD) linear systems, regularized least-squares, kernel machines, and related spectral or optimization problems. By stochastically generating a sketch of the input matrix, these methods construct preconditioners that dramatically improve convergence rates of Krylov-subspace and first-order methods, often with minimal memory and computational overhead. Recent developments have introduced recursive, block, and multi-level extensions to further enhance efficiency and applicability in high-dimensional regimes, operator-only settings, and ill-conditioned scenarios (Frangella et al., 2021, Garg et al., 21 Jun 2025, Dereziński et al., 2024, Chen et al., 30 Jan 2025, Hong et al., 2024, Abedsoltan et al., 2023).

1. Fundamentals of the Randomized Nyström Preconditioner

The classic randomized Nyström algorithm begins with a PSD matrix ARn×nA\in\mathbb{R}^{n\times n} and samples a test matrix ΩRn×\Omega\in\mathbb{R}^{n\times \ell} with i.i.d. Gaussian (or other suitable random) entries. The sketch C=AΩC = A\Omega and the small Gram matrix W=ΩAΩW = \Omega^\top A\Omega are computed. The Nyström approximation is

A^nys=CWC,\widehat{A}_\mathrm{nys} = C W^\dagger C^\top,

yielding a rank-\ell SPSD surrogate for AA. The approximation error in operator norm is tightly controlled by the tail of the spectrum and the sketch size; with deff(μ)\ell \gtrsim d_{\rm eff}(\mu) (the effective dimension at the regularization scale), the error satisfies EAA^nysμ\mathbb E \|A - \widehat{A}_\mathrm{nys}\| \lesssim \mu, ensuring the spectrum is captured up to the noise floor (Frangella et al., 2021).

The resulting Nyström preconditioner for the regularized system A+μIA+\mu I is typically formulated in factorized or deflation form: ΩRn×\Omega\in\mathbb{R}^{n\times \ell}0 where ΩRn×\Omega\in\mathbb{R}^{n\times \ell}1 and ΩRn×\Omega\in\mathbb{R}^{n\times \ell}2 are the eigenvectors and eigenvalues of ΩRn×\Omega\in\mathbb{R}^{n\times \ell}3, and ΩRn×\Omega\in\mathbb{R}^{n\times \ell}4 is a suitable shift ensuring regularity (Chen et al., 30 Jan 2025, Frangella et al., 2021).

2. Block and Recursive Generalizations: Block-Nyström

Block-Nyström preconditioning, as introduced by Garg & Dereziński, partitions the landmark selection across multiple small independent Nyström factorizations. Given ΩRn×\Omega\in\mathbb{R}^{n\times \ell}5 blocks, each block samples a landmark set ΩRn×\Omega\in\mathbb{R}^{n\times \ell}6 (of size ΩRn×\Omega\in\mathbb{R}^{n\times \ell}7), computes a small Nyström factor, and returns ΩRn×\Omega\in\mathbb{R}^{n\times \ell}8; the aggregated block-Nyström approximation is

ΩRn×\Omega\in\mathbb{R}^{n\times \ell}9

The preconditioner is then C=AΩC = A\Omega0, and a recursive preconditioning scheme efficiently solves linear systems with C=AΩC = A\Omega1 via nested application of PCG and smaller block-Nyström models. This approach leverages variance reduction from averaging, enabling stronger tail spectral control within the same computational budget compared to classical Nyström with a single large sample (Garg et al., 21 Jun 2025).

Key complexity results:

  • Block-Nyström requires C=AΩC = A\Omega2 total landmarks and achieves a spectral approximation

C=AΩC = A\Omega3

with preconditioning quality scaling with the approximation parameter C=AΩC = A\Omega4 (smaller C=AΩC = A\Omega5 yields higher accuracy at higher cost; larger C=AΩC = A\Omega6 trades accuracy for speed).

3. Spectral Properties, Condition Number, and Complexity

A central theoretical guarantee is that, with a sketch size proportional to the matrix's effective dimension at the regularization scale, the preconditioned system exhibits a constant or near-constant condition number: C=AΩC = A\Omega7 where C=AΩC = A\Omega8 is a universal constant (often C=AΩC = A\Omega9 in practice), and W=ΩAΩW = \Omega^\top A\Omega0. This ensures PCG convergence in W=ΩAΩW = \Omega^\top A\Omega1 iterations, independent of the input condition number (Frangella et al., 2021, Chen et al., 30 Jan 2025). Such behavior contrasts with classical preconditioners, whose iteration count scales as W=ΩAΩW = \Omega^\top A\Omega2.

Block and multi-level approaches additionally control the tail spectrum via hierarchical sketching (as in Multi-level Sketched Preconditioning (Dereziński et al., 2024)), reducing both per-iteration cost and total runtime, even as the input matrix exhibits heavy-tailed spectral decay.

Preconditioner Type Condition Number (after PC) Iteration Count (PCG)
Unpreconditioned W=ΩAΩW = \Omega^\top A\Omega3 W=ΩAΩW = \Omega^\top A\Omega4
Nyström (single/block) W=ΩAΩW = \Omega^\top A\Omega5 for W=ΩAΩW = \Omega^\top A\Omega6 W=ΩAΩW = \Omega^\top A\Omega7
Multi-level W=ΩAΩW = \Omega^\top A\Omega8 (lower for large W=ΩAΩW = \Omega^\top A\Omega9) A^nys=CWC,\widehat{A}_\mathrm{nys} = C W^\dagger C^\top,0)

4. Extensions: Multi-level, Operator-only, and Adaptive Rank Selection

Multi-level randomized Nyström preconditioning introduces additional levels of sketching to avoid expensive inversion of large sketched matrices. After constructing an initial Nyström preconditioner, subsequent smaller sketching matrices further precondition the subproblems arising in the application of the first preconditioner. This hierarchical approach yields optimal or near-optimal runtimes for:

  • General linear systems with a small number of large singular values (A^nys=CWC,\widehat{A}_\mathrm{nys} = C W^\dagger C^\top,1 for A^nys=CWC,\widehat{A}_\mathrm{nys} = C W^\dagger C^\top,2 outliers)
  • Regularized PSD systems with runtime A^nys=CWC,\widehat{A}_\mathrm{nys} = C W^\dagger C^\top,3, where A^nys=CWC,\widehat{A}_\mathrm{nys} = C W^\dagger C^\top,4 (Dereziński et al., 2024).

In operator-only settings (when the input matrix is accessed only via matrix-vector products, e.g., variational inverse problems or PDE-constrained optimization), randomized Nyström preconditioners can be computed on-the-fly via batched operator applications, GPU implementations, or autodiff (for Gramian matrices in PINNs). The algorithms are storage and computation efficient, often requiring only a moderate sketch size (A^nys=CWC,\widehat{A}_\mathrm{nys} = C W^\dagger C^\top,5—A^nys=CWC,\widehat{A}_\mathrm{nys} = C W^\dagger C^\top,6 suffices in practice) and yielding high acceleration (Hong et al., 2024, Bioli et al., 16 May 2025).

Adaptive rank selection heuristics, such as power iteration error estimation or doubling strategies, further enhance practicality by obviating the need to know the effective dimension a priori (Frangella et al., 2021). The Nyström sketch size is incrementally increased until the residual error or spectral gap criterion is met, guaranteeing robust preconditioner performance with minimal overhead.

5. Algorithmic Implementation and Practical Guidelines

A generic Nyström preconditioning pipeline comprises:

  1. Sketch Generation: Draw Gaussian or structured A^nys=CWC,\widehat{A}_\mathrm{nys} = C W^\dagger C^\top,7, form A^nys=CWC,\widehat{A}_\mathrm{nys} = C W^\dagger C^\top,8, compute A^nys=CWC,\widehat{A}_\mathrm{nys} = C W^\dagger C^\top,9.
  2. Low-rank Approximation: Obtain \ell0; numerically stabilize via diagonal shifting and orthogonalization (e.g., Cholesky or QR).
  3. Preconditioner Construction: Form \ell1 in factorized or block form; in block-Nyström, average multiple sketches.
  4. Application: Apply \ell2 to vectors using Woodbury-type formulas or block-diagonal structure, requiring only matrix-vector and small matrix solves.
  5. Integration with Iterative Solver: Employ in PCG, Newton-CG, or first-order methods. Recursively preconditioned or multi-level schemes use nested PCG or block solvings.
  6. Adaptive Tuning: Use residual-based stopping or error estimation to adjust sketch size; trade approximation parameter to balance cost and preconditioning strength.

Parameter selection guidelines:

  • Approximation parameter (\ell3 or sketch size \ell4) controls the trade-off between computational cost and spectral quality.
  • Regularization should be set in accordance with numerical stability; small diagonal shifts stabilize matrix inversions in finite precision.
  • In block or multi-level settings, the number of blocks, inner sketch sizes, and recursion depth should be tuned to problem dimensions and spectral properties (Garg et al., 21 Jun 2025, Dereziński et al., 2024).

6. Applications in Optimization, Kernel Methods, and Scientific Computing

Randomized Nyström preconditioners are widely deployed in:

  • Kernel ridge regression, Gaussian process inference, and minimum-norm interpolation: Preconditioning the kernel (Gram) matrix accelerates gradient descent, conjugate gradient, or Newton-based solvers, with minimal memory (Abedsoltan et al., 2023, Garg et al., 21 Jun 2025).
  • Variational inverse problems and PDE-constrained optimization: On-the-fly, matrix-free Nyström preconditioning dramatically reduces iteration counts in ill-posed imaging and physics-informed learning tasks (Hong et al., 2024, Bioli et al., 16 May 2025).
  • Large-scale QP solvers and interior-point methods: Matrix-free randomized Nyström PCG accelerates Newton and KKT solves, yielding significant walltime reduction in convex QPs (Chu et al., 2024).
  • Sparse linear algebra: Two-level Nyström–Schur preconditioners integrate randomized low-rank correction into multilevel algebraic preconditioners for sparse SPD matrices, outperforming conventional incomplete factorizations (Daas et al., 2021).
  • Multiple right-hand side/regularization path problems and Gaussian sampling: Block Krylov subspace solvers with Nyström augmentation deliver simultaneous fast solution across a continuum of regularization parameters and batch Gaussian vector sampling (Chen et al., 30 Jan 2025).

7. Theoretical Advances and Empirical Validation

Rigorous bounds demonstrate that, with sketch size on the order of the effective dimension, randomized Nyström preconditioners deliver near-optimal spectral properties—flattening the spectrum, reducing condition numbers, and guaranteeing fast convergence. Key results include:

In summary, randomized Nyström preconditioning constitutes a unifying and highly effective methodology for accelerating large-scale linear algebra, kernel learning, and scientific optimization, offering a principled trade-off between accuracy, speed, and memory across a wide spectrum of modern computational challenges.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Randomized Nyström Preconditioning.