Block-Nyström Method: Scalable Matrix Approximation
- Block-Nyström method is a low-rank matrix compression technique that partitions kernel matrices into blocks for efficient approximation.
- It employs ridge-leverage score sampling and block averaging to reduce computational cost while maintaining strong spectral approximation guarantees.
- The method is applied in scalable kernel methods, convex optimization preconditioning, and statistical learning, especially for matrices with slow spectral decay.
The Block-Nyström method is a compression and low-rank approximation technique for large matrices, especially positive semidefinite (psd) kernel matrices, that extends and generalizes the classical Nyström method. By partitioning the sketching and extension process into blocks, the method reduces computational complexity and memory usage while maintaining strong spectral approximation guarantees. Block-Nyström has become a central tool in scalable kernel methods, convex optimization (as a preconditioner for second-order solvers), and statistical learning, particularly when the spectrum of the underlying matrix decays slowly and the effective dimension is large (Nemtsov et al., 2013, Garg et al., 21 Jun 2025).
1. Conceptual Foundations and Notation
Let or, in kernel learning, be a psd matrix. Classical Nyström constructs a low-rank approximation by selecting a subset of rows and columns (“landmarks”) and interpolating the matrix structure from a small sample block. Block-Nyström generalizes this by:
- Partitioning the total sample set into disjoint or overlapping blocks, each producing a separate Nyström approximation.
- Aggregating these blockwise approximations, usually by averaging, to obtain the final surrogate.
Given regularization parameter and block parameters (number of blocks), (block size), and total landmark number , the algorithm leverages ridge-leverage score sampling to form more computationally tractable sub-sketches (Garg et al., 21 Jun 2025).
2. Algorithmic Procedures
Block-Nyström for Kernel Matrices
Given , the Block-Nyström algorithm proceeds as follows:
- Ridge-Leverage Score Sampling: Compute approximate -ridge-leverage scores for the data. These guide the probability of selecting row/column indices for sketching.
- Block Partition: Sample indices i.i.d. from 0 and divide into 1 blocks 2, each of size 3.
- Mini-Nyström Approximations: For each block 4, compute
5
- Averaging: The overall approximation is
6
This can also be expressed as stacking all landmarks and forming a block-diagonal 7.
The pseudocode and further block structure details are explicitly given in (Garg et al., 21 Jun 2025).
General Matrix Compression and SVD/EVD
For a general matrix 8, select 9 and by complete pivoting reorder rows and columns to obtain the block
0
where 1 is the sample, and the goal is to construct a rank-2 approximation 3 via Nyström-style extension of singular (or eigen-)vectors (Nemtsov et al., 2013).
3. Theoretical Guarantees and Error Bounds
Spectral Approximation
Block-Nyström offers the following guarantee: with block size 4, 5, and high probability,
6
i.e., 7 is a 8-regularized 9-approximation to 0 (Garg et al., 21 Jun 2025).
The error in the classical block-Nyström compression of a general matrix 1 is bounded by
2
provided 3 is well-conditioned (i.e., 4 is not too small). If 5 is exactly rank‑6, the approximation is exact (Nemtsov et al., 2013).
Complexity
For kernel matrices, per-block cost is 7, giving total 8 for 9. Compared to a classical Nyström sketch of size 0, Block-Nyström is much cheaper for slow-decaying spectra due to reduced effective dimension 1 (Garg et al., 21 Jun 2025).
For general 2, computing the approximate SVD/EVD has cost 3 after forming 4 (Nemtsov et al., 2013).
4. Sampling, Conditioning, and Block Selection
The effectiveness of Block-Nyström relies on:
- Landmark Selection: Ridge-leverage score sampling targets the most informative directions relative to 5-regularization. For general matrices, optimal selection uses rank-revealing QR (RRQR) factorization on a low-rank factorization 6 to identify maximally independent blocks (Nemtsov et al., 2013, Garg et al., 21 Jun 2025).
- Block Conditioning: For accurate results, the sample block(s) 7 (or Nyström blocks) must be well-conditioned. RRQR attempts to maximize the smallest singular value 8, directly impacting the accuracy and stability of the approximation. Empirically, larger 9 correlates exponentially with smaller reconstruction error (Nemtsov et al., 2013).
- Parameter Tuning: The block size 0 controls the bias, while the number of blocks 1 modulates the variance in the tail of the spectrum. Optimal values depend on the desired trade-off between accuracy and computational cost, and generally, 2, 3, where 4 (Garg et al., 21 Jun 2025).
5. Applications: Optimization and Statistical Learning
Preconditioning and Convex Optimization
Block-Nyström matrices serve as effective preconditioners for Hessian-based optimization. Given a convex functional 5 with Hessian 6, forming a Block-Nyström preconditioner 7 ensures
8
and enables fast convergence in Newton-type or Nesterov-accelerated solvers, with iteration complexity scaling as 9 (Garg et al., 21 Jun 2025).
Kernel Ridge Regression (KRR)
In KRR, Block-Nyström approximates the kernel matrix to solve 0 efficiently. Under standard operator-capacity and smoothness assumptions, the excess risk satisfies
1
where 2 is the source smoothness and 3 the capacity exponent. Block-Nyström thus incurs only an extra factor 4 in prediction error rate while vastly reducing computational cost in high effective-dimension regimes (Garg et al., 21 Jun 2025).
6. Recursive Preconditioning and Inversion
Applying 5 to a vector is central in statistical solvers and iterative optimization. The block-diagonal structure of Block-Nyström enables a recursive preconditioning scheme:
- At each recursion level, a Block-Nyström with fewer blocks and larger regularization (6) preconditions the system.
- The process recurses to a base case solved via standard Woodbury formula, maintaining efficiency and exploiting block sparsity. Under the stated sampling regime, one achieves near-optimal runtime for solving linear systems to 7-accuracy (Garg et al., 21 Jun 2025).
7. Practical Guidance, Stability, and Limitations
Block-Nyström is most effective for approximately low-rank matrices—especially where the spectrum decays rapidly or polynomially. Selection of the sample size 8 (or block size 9) should target the “knee” of the spectral decay, just past where the singular or eigenvalues drop sharply. The conditioning of each sampled block is critical; RRQR or greedy pivot sampling ensures 0 is maximized, which directly governs numerical stability and relative error.
For symmetric positive semi-definite kernels, especially those arising in machine learning, the single-step Block-Nyström (using Cholesky) is both computationally robust and efficient (1). If the matrix is exactly rank-2 (i.e., is low-rank), Block-Nyström can yield an exact decomposition.
A plausible implication is that, in regimes of heavy-tailed spectra where the effective dimension is large, Block-Nyström’s divide-and-average structure achieves superior bias-variance tradeoffs compared to a single monolithic Nyström approximation, with improvements visible in both operator norm error and downstream learning tasks (Garg et al., 21 Jun 2025).
References:
- "Matrix Compression using the Nystroöm Method" (Nemtsov et al., 2013)
- "Faster Low-Rank Approximation and Kernel Ridge Regression via the Block-Nyström Method" (Garg et al., 21 Jun 2025)