Papers
Topics
Authors
Recent
Search
2000 character limit reached

Block-Nyström Method: Scalable Matrix Approximation

Updated 2 April 2026
  • Block-Nyström method is a low-rank matrix compression technique that partitions kernel matrices into blocks for efficient approximation.
  • It employs ridge-leverage score sampling and block averaging to reduce computational cost while maintaining strong spectral approximation guarantees.
  • The method is applied in scalable kernel methods, convex optimization preconditioning, and statistical learning, especially for matrices with slow spectral decay.

The Block-Nyström method is a compression and low-rank approximation technique for large matrices, especially positive semidefinite (psd) kernel matrices, that extends and generalizes the classical Nyström method. By partitioning the sketching and extension process into blocks, the method reduces computational complexity and memory usage while maintaining strong spectral approximation guarantees. Block-Nyström has become a central tool in scalable kernel methods, convex optimization (as a preconditioner for second-order solvers), and statistical learning, particularly when the spectrum of the underlying matrix decays slowly and the effective dimension is large (Nemtsov et al., 2013, Garg et al., 21 Jun 2025).

1. Conceptual Foundations and Notation

Let MRm×nM\in\mathbb{R}^{m\times n} or, in kernel learning, KRn×nK\in\mathbb{R}^{n\times n} be a psd matrix. Classical Nyström constructs a low-rank approximation by selecting a subset of rows and columns (“landmarks”) and interpolating the matrix structure from a small sample block. Block-Nyström generalizes this by:

  • Partitioning the total sample set into disjoint or overlapping blocks, each producing a separate Nyström approximation.
  • Aggregating these blockwise approximations, usually by averaging, to obtain the final surrogate.

Given regularization parameter λ>0\lambda > 0 and block parameters qq (number of blocks), bb (block size), and total landmark number m=qbm = qb, the algorithm leverages ridge-leverage score sampling to form more computationally tractable sub-sketches (Garg et al., 21 Jun 2025).

2. Algorithmic Procedures

Block-Nyström for Kernel Matrices

Given K0K \succeq 0, the Block-Nyström algorithm proceeds as follows:

  1. Ridge-Leverage Score Sampling: Compute approximate α2λ\alpha^2\lambda-ridge-leverage scores pip_i for the data. These guide the probability of selecting row/column indices for sketching.
  2. Block Partition: Sample m=qbm=q\,b indices i.i.d. from KRn×nK\in\mathbb{R}^{n\times n}0 and divide into KRn×nK\in\mathbb{R}^{n\times n}1 blocks KRn×nK\in\mathbb{R}^{n\times n}2, each of size KRn×nK\in\mathbb{R}^{n\times n}3.
  3. Mini-Nyström Approximations: For each block KRn×nK\in\mathbb{R}^{n\times n}4, compute

KRn×nK\in\mathbb{R}^{n\times n}5

  1. Averaging: The overall approximation is

KRn×nK\in\mathbb{R}^{n\times n}6

This can also be expressed as stacking all landmarks and forming a block-diagonal KRn×nK\in\mathbb{R}^{n\times n}7.

The pseudocode and further block structure details are explicitly given in (Garg et al., 21 Jun 2025).

General Matrix Compression and SVD/EVD

For a general matrix KRn×nK\in\mathbb{R}^{n\times n}8, select KRn×nK\in\mathbb{R}^{n\times n}9 and by complete pivoting reorder rows and columns to obtain the block

λ>0\lambda > 00

where λ>0\lambda > 01 is the sample, and the goal is to construct a rank-λ>0\lambda > 02 approximation λ>0\lambda > 03 via Nyström-style extension of singular (or eigen-)vectors (Nemtsov et al., 2013).

3. Theoretical Guarantees and Error Bounds

Spectral Approximation

Block-Nyström offers the following guarantee: with block size λ>0\lambda > 04, λ>0\lambda > 05, and high probability,

λ>0\lambda > 06

i.e., λ>0\lambda > 07 is a λ>0\lambda > 08-regularized λ>0\lambda > 09-approximation to qq0 (Garg et al., 21 Jun 2025).

The error in the classical block-Nyström compression of a general matrix qq1 is bounded by

qq2

provided qq3 is well-conditioned (i.e., qq4 is not too small). If qq5 is exactly rank‑qq6, the approximation is exact (Nemtsov et al., 2013).

Complexity

For kernel matrices, per-block cost is qq7, giving total qq8 for qq9. Compared to a classical Nyström sketch of size bb0, Block-Nyström is much cheaper for slow-decaying spectra due to reduced effective dimension bb1 (Garg et al., 21 Jun 2025).

For general bb2, computing the approximate SVD/EVD has cost bb3 after forming bb4 (Nemtsov et al., 2013).

4. Sampling, Conditioning, and Block Selection

The effectiveness of Block-Nyström relies on:

  • Landmark Selection: Ridge-leverage score sampling targets the most informative directions relative to bb5-regularization. For general matrices, optimal selection uses rank-revealing QR (RRQR) factorization on a low-rank factorization bb6 to identify maximally independent blocks (Nemtsov et al., 2013, Garg et al., 21 Jun 2025).
  • Block Conditioning: For accurate results, the sample block(s) bb7 (or Nyström blocks) must be well-conditioned. RRQR attempts to maximize the smallest singular value bb8, directly impacting the accuracy and stability of the approximation. Empirically, larger bb9 correlates exponentially with smaller reconstruction error (Nemtsov et al., 2013).
  • Parameter Tuning: The block size m=qbm = qb0 controls the bias, while the number of blocks m=qbm = qb1 modulates the variance in the tail of the spectrum. Optimal values depend on the desired trade-off between accuracy and computational cost, and generally, m=qbm = qb2, m=qbm = qb3, where m=qbm = qb4 (Garg et al., 21 Jun 2025).

5. Applications: Optimization and Statistical Learning

Preconditioning and Convex Optimization

Block-Nyström matrices serve as effective preconditioners for Hessian-based optimization. Given a convex functional m=qbm = qb5 with Hessian m=qbm = qb6, forming a Block-Nyström preconditioner m=qbm = qb7 ensures

m=qbm = qb8

and enables fast convergence in Newton-type or Nesterov-accelerated solvers, with iteration complexity scaling as m=qbm = qb9 (Garg et al., 21 Jun 2025).

Kernel Ridge Regression (KRR)

In KRR, Block-Nyström approximates the kernel matrix to solve K0K \succeq 00 efficiently. Under standard operator-capacity and smoothness assumptions, the excess risk satisfies

K0K \succeq 01

where K0K \succeq 02 is the source smoothness and K0K \succeq 03 the capacity exponent. Block-Nyström thus incurs only an extra factor K0K \succeq 04 in prediction error rate while vastly reducing computational cost in high effective-dimension regimes (Garg et al., 21 Jun 2025).

6. Recursive Preconditioning and Inversion

Applying K0K \succeq 05 to a vector is central in statistical solvers and iterative optimization. The block-diagonal structure of Block-Nyström enables a recursive preconditioning scheme:

  • At each recursion level, a Block-Nyström with fewer blocks and larger regularization (K0K \succeq 06) preconditions the system.
  • The process recurses to a base case solved via standard Woodbury formula, maintaining efficiency and exploiting block sparsity. Under the stated sampling regime, one achieves near-optimal runtime for solving linear systems to K0K \succeq 07-accuracy (Garg et al., 21 Jun 2025).

7. Practical Guidance, Stability, and Limitations

Block-Nyström is most effective for approximately low-rank matrices—especially where the spectrum decays rapidly or polynomially. Selection of the sample size K0K \succeq 08 (or block size K0K \succeq 09) should target the “knee” of the spectral decay, just past where the singular or eigenvalues drop sharply. The conditioning of each sampled block is critical; RRQR or greedy pivot sampling ensures α2λ\alpha^2\lambda0 is maximized, which directly governs numerical stability and relative error.

For symmetric positive semi-definite kernels, especially those arising in machine learning, the single-step Block-Nyström (using Cholesky) is both computationally robust and efficient (α2λ\alpha^2\lambda1). If the matrix is exactly rank-α2λ\alpha^2\lambda2 (i.e., is low-rank), Block-Nyström can yield an exact decomposition.

A plausible implication is that, in regimes of heavy-tailed spectra where the effective dimension is large, Block-Nyström’s divide-and-average structure achieves superior bias-variance tradeoffs compared to a single monolithic Nyström approximation, with improvements visible in both operator norm error and downstream learning tasks (Garg et al., 21 Jun 2025).


References:

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Block-Nyström Method.