Randomized Nyström Preconditioning
- Randomized Nyström preconditioning is a technique that uses randomized low-rank approximations to construct efficient preconditioners for large SPSD linear systems.
- It enhances convergence rates by reducing the effective condition number through innovations like block and recursive extensions.
- Adaptive rank selection and memory-efficient implementations make it practical for applications in kernel learning, PDE-constrained optimization, and inverse problems.
Randomized Nyström preconditioning is a class of algorithms that leverage randomized low-rank matrix approximations—specifically the Nyström method—to accelerate iterative solvers for large, symmetric positive (semi)definite (SPSD) linear systems, regularized least-squares, kernel machines, and related spectral or optimization problems. By stochastically generating a sketch of the input matrix, these methods construct preconditioners that dramatically improve convergence rates of Krylov-subspace and first-order methods, often with minimal memory and computational overhead. Recent developments have introduced recursive, block, and multi-level extensions to further enhance efficiency and applicability in high-dimensional regimes, operator-only settings, and ill-conditioned scenarios (Frangella et al., 2021, Garg et al., 21 Jun 2025, Dereziński et al., 2024, Chen et al., 30 Jan 2025, Hong et al., 2024, Abedsoltan et al., 2023).
1. Fundamentals of the Randomized Nyström Preconditioner
The classic randomized Nyström algorithm begins with a PSD matrix and samples a test matrix with i.i.d. Gaussian (or other suitable random) entries. The sketch and the small Gram matrix are computed. The Nyström approximation is
yielding a rank- SPSD surrogate for . The approximation error in operator norm is tightly controlled by the tail of the spectrum and the sketch size; with (the effective dimension at the regularization scale), the error satisfies , ensuring the spectrum is captured up to the noise floor (Frangella et al., 2021).
The resulting Nyström preconditioner for the regularized system is typically formulated in factorized or deflation form: 0 where 1 and 2 are the eigenvectors and eigenvalues of 3, and 4 is a suitable shift ensuring regularity (Chen et al., 30 Jan 2025, Frangella et al., 2021).
2. Block and Recursive Generalizations: Block-Nyström
Block-Nyström preconditioning, as introduced by Garg & Dereziński, partitions the landmark selection across multiple small independent Nyström factorizations. Given 5 blocks, each block samples a landmark set 6 (of size 7), computes a small Nyström factor, and returns 8; the aggregated block-Nyström approximation is
9
The preconditioner is then 0, and a recursive preconditioning scheme efficiently solves linear systems with 1 via nested application of PCG and smaller block-Nyström models. This approach leverages variance reduction from averaging, enabling stronger tail spectral control within the same computational budget compared to classical Nyström with a single large sample (Garg et al., 21 Jun 2025).
Key complexity results:
- Block-Nyström requires 2 total landmarks and achieves a spectral approximation
3
with preconditioning quality scaling with the approximation parameter 4 (smaller 5 yields higher accuracy at higher cost; larger 6 trades accuracy for speed).
3. Spectral Properties, Condition Number, and Complexity
A central theoretical guarantee is that, with a sketch size proportional to the matrix's effective dimension at the regularization scale, the preconditioned system exhibits a constant or near-constant condition number: 7 where 8 is a universal constant (often 9 in practice), and 0. This ensures PCG convergence in 1 iterations, independent of the input condition number (Frangella et al., 2021, Chen et al., 30 Jan 2025). Such behavior contrasts with classical preconditioners, whose iteration count scales as 2.
Block and multi-level approaches additionally control the tail spectrum via hierarchical sketching (as in Multi-level Sketched Preconditioning (Dereziński et al., 2024)), reducing both per-iteration cost and total runtime, even as the input matrix exhibits heavy-tailed spectral decay.
| Preconditioner Type | Condition Number (after PC) | Iteration Count (PCG) |
|---|---|---|
| Unpreconditioned | 3 | 4 |
| Nyström (single/block) | 5 for 6 | 7 |
| Multi-level | 8 (lower for large 9) | 0) |
4. Extensions: Multi-level, Operator-only, and Adaptive Rank Selection
Multi-level randomized Nyström preconditioning introduces additional levels of sketching to avoid expensive inversion of large sketched matrices. After constructing an initial Nyström preconditioner, subsequent smaller sketching matrices further precondition the subproblems arising in the application of the first preconditioner. This hierarchical approach yields optimal or near-optimal runtimes for:
- General linear systems with a small number of large singular values (1 for 2 outliers)
- Regularized PSD systems with runtime 3, where 4 (Dereziński et al., 2024).
In operator-only settings (when the input matrix is accessed only via matrix-vector products, e.g., variational inverse problems or PDE-constrained optimization), randomized Nyström preconditioners can be computed on-the-fly via batched operator applications, GPU implementations, or autodiff (for Gramian matrices in PINNs). The algorithms are storage and computation efficient, often requiring only a moderate sketch size (5—6 suffices in practice) and yielding high acceleration (Hong et al., 2024, Bioli et al., 16 May 2025).
Adaptive rank selection heuristics, such as power iteration error estimation or doubling strategies, further enhance practicality by obviating the need to know the effective dimension a priori (Frangella et al., 2021). The Nyström sketch size is incrementally increased until the residual error or spectral gap criterion is met, guaranteeing robust preconditioner performance with minimal overhead.
5. Algorithmic Implementation and Practical Guidelines
A generic Nyström preconditioning pipeline comprises:
- Sketch Generation: Draw Gaussian or structured 7, form 8, compute 9.
- Low-rank Approximation: Obtain 0; numerically stabilize via diagonal shifting and orthogonalization (e.g., Cholesky or QR).
- Preconditioner Construction: Form 1 in factorized or block form; in block-Nyström, average multiple sketches.
- Application: Apply 2 to vectors using Woodbury-type formulas or block-diagonal structure, requiring only matrix-vector and small matrix solves.
- Integration with Iterative Solver: Employ in PCG, Newton-CG, or first-order methods. Recursively preconditioned or multi-level schemes use nested PCG or block solvings.
- Adaptive Tuning: Use residual-based stopping or error estimation to adjust sketch size; trade approximation parameter to balance cost and preconditioning strength.
Parameter selection guidelines:
- Approximation parameter (3 or sketch size 4) controls the trade-off between computational cost and spectral quality.
- Regularization should be set in accordance with numerical stability; small diagonal shifts stabilize matrix inversions in finite precision.
- In block or multi-level settings, the number of blocks, inner sketch sizes, and recursion depth should be tuned to problem dimensions and spectral properties (Garg et al., 21 Jun 2025, Dereziński et al., 2024).
6. Applications in Optimization, Kernel Methods, and Scientific Computing
Randomized Nyström preconditioners are widely deployed in:
- Kernel ridge regression, Gaussian process inference, and minimum-norm interpolation: Preconditioning the kernel (Gram) matrix accelerates gradient descent, conjugate gradient, or Newton-based solvers, with minimal memory (Abedsoltan et al., 2023, Garg et al., 21 Jun 2025).
- Variational inverse problems and PDE-constrained optimization: On-the-fly, matrix-free Nyström preconditioning dramatically reduces iteration counts in ill-posed imaging and physics-informed learning tasks (Hong et al., 2024, Bioli et al., 16 May 2025).
- Large-scale QP solvers and interior-point methods: Matrix-free randomized Nyström PCG accelerates Newton and KKT solves, yielding significant walltime reduction in convex QPs (Chu et al., 2024).
- Sparse linear algebra: Two-level Nyström–Schur preconditioners integrate randomized low-rank correction into multilevel algebraic preconditioners for sparse SPD matrices, outperforming conventional incomplete factorizations (Daas et al., 2021).
- Multiple right-hand side/regularization path problems and Gaussian sampling: Block Krylov subspace solvers with Nyström augmentation deliver simultaneous fast solution across a continuum of regularization parameters and batch Gaussian vector sampling (Chen et al., 30 Jan 2025).
7. Theoretical Advances and Empirical Validation
Rigorous bounds demonstrate that, with sketch size on the order of the effective dimension, randomized Nyström preconditioners deliver near-optimal spectral properties—flattening the spectrum, reducing condition numbers, and guaranteeing fast convergence. Key results include:
- Spectral bounds: For 5, all but 6 eigenvalues of the preconditioned system cluster tightly around one, and the condition number is constant with high probability (Frangella et al., 2021, Garg et al., 21 Jun 2025, Hong et al., 2024, Chen et al., 30 Jan 2025).
- Sharp complexity: Recursion and block-averaging yield near-linear or quasi-linear total runtime for many classes of dense and structured problems, breaking the runtime bottlenecks of both classical iterative and direct algorithms (Garg et al., 21 Jun 2025, Dereziński et al., 2024).
- Empirical results: Across synthetic and real datasets in regression, imaging, and scientific ML, randomized Nyström preconditioning consistently reduces Krylov or gradient-descent iteration counts by one to two orders of magnitude, with negligible increase in per-iteration cost (Frangella et al., 2021, Bioli et al., 16 May 2025, Hong et al., 2024, Chu et al., 2024).
In summary, randomized Nyström preconditioning constitutes a unifying and highly effective methodology for accelerating large-scale linear algebra, kernel learning, and scientific optimization, offering a principled trade-off between accuracy, speed, and memory across a wide spectrum of modern computational challenges.