Papers
Topics
Authors
Recent
2000 character limit reached

Randomized Sketch-and-Project Methods

Updated 14 November 2025
  • Randomized sketch-and-project methods are iterative algorithms that use random projections to efficiently solve linear systems, least squares problems, and matrix equations.
  • They achieve global Q-linear convergence in expectation by unifying techniques from classical methods such as Kaczmarz and coordinate descent with modern random matrix theory.
  • Selecting appropriate sketching matrices (e.g., Gaussian, SRHT, sparse sketches) is key to balancing computational cost and embedding guarantees for large-scale problems.

Randomized sketch-and-project methods refer to a broad and powerful class of iterative randomized algorithms for solving linear systems, least squares problems, linear feasibility, matrix equations, and related tasks in scientific computing. These algorithms leverage random projections (sketching) to reduce computational and storage complexity, while maintaining precise analytic control of convergence rates and solution quality. Developed and unified in the past decade, they bridge classical row- or coordinate-style methods such as Kaczmarz and coordinate descent, randomized block and Newton strategies, as well as modern subspace projection schemes tied to random matrix theory and optimal embedding guarantees.

1. Formulation and Unified Framework

Given a linear system Ax=bA x = b with ARm×nA \in \mathbb{R}^{m \times n}, randomized sketch-and-project methods proceed by iteratively projecting the current iterate xkx^k onto the solution set of a randomly sketched subsystem. In its most general form, the update takes the form: xk+1=argminxRnxxkW2subject to STAx=STb,x^{k+1} = \arg\min_{x \in \mathbb{R}^n} \|x - x^k\|_W^2 \quad \text{subject to } S^T A x = S^T b, where W0W \succ 0 is a user-chosen geometry (weight) matrix, SRm×qS \in \mathbb{R}^{m \times q} is a random sketching matrix (often qmq \ll m), and the norm is zW2=zTWz\|z\|_W^2 = z^T W z (Gower et al., 2015).

This update admits multiple equivalent interpretations:

  • Sketch-and-project: Project xkx^k in the WW-norm onto the affine subspace defined by the sketched equations.
  • Random update: Closed-form as

xk+1=xkW1ATS(STAW1ATS)ST(Axkb).x^{k+1} = x^k - W^{-1}A^T S (S^T A W^{-1}A^T S)^\dagger S^T(A x^k - b).

  • Random linear solve: Solve a small projected problem.
  • Randomized intersect and fixed-point forms: The update corresponds to intersection of affine spaces and fixed-point contraction.

Key classical methods arise as special cases:

  • Randomized Kaczmarz (RK): W=IW=I, S=eiS=e_i [coordinate projection].
  • Randomized coordinate descent (CD): W=AW=A, S=eiS=e_i, A0A \succ 0.
  • Randomized Newton/block Kaczmarz: Block selection for SS.
  • Gaussian Kaczmarz/pursuit: SN(0,I)S \sim \mathcal{N}(0, I) (Gower et al., 2015, Gower, 2016).

For more complex problems such as matrix equations AXB=CA X B = C, (block) sketch-and-project methods project in a matrix-weighted Frobenius norm onto the solution set of doubly-sketched equations, yielding updates of the form: Xk+1=XkZ1(XkX)Z2,X^{k+1} = X^k - Z_1' (X^k - X^*) Z_2, where Z1,Z2Z_1', Z_2 are based on left and right sketches for AA and BB respectively (Bao et al., 2023).

2. Convergence Analysis and Rate Characterization

The fundamental convergence guarantee is global QQ-linear convergence in expectation. Let ZZ be the expected (random) update matrix, and define: ρ=1λmin(W1/2E[Z]W1/2)\rho = 1 - \lambda_{\min}(W^{-1/2} \mathbb{E}[Z] W^{-1/2}) Under mild rank assumptions, ρ<1\rho < 1 and

ExkxW2ρkx0xW2\mathbb{E}\|x^k - x^*\|_W^2 \leq \rho^k \|x^0 - x^*\|_W^2

(Gower et al., 2015, Gower, 2016). The rate ρ\rho is governed by the worst-case contraction over the distribution of sketches, with block size and importance sampling distributions improving the bound. For the matrix equation AXB=CA X B = C,

EXkXF(G)2ρkX0XF(G)2,ρ=1λmin(E[Z2Z1])\mathbb{E}\|X^{k} - X^*\|_{F(G)}^2 \leq \rho^k \|X^0 - X^*\|_{F(G)}^2, \quad \rho = 1 - \lambda_{\min}(E[Z_2 \otimes Z_1'])

(Bao et al., 2023).

For iterative sketch-and-project applied to preconditioned least squares, the spectrum of the sketched Gram matrix UTSTSUU^T S^T S U for A=UDVTA = U D V^T plays a pivotal role. Embedding and convergence probabilities can be sharply predicted in the high-dimensional, tall-data limit using tools from random matrix theory, in particular the Tracy–Widom law for the largest eigenvalue of a Wishart matrix (Ahfock et al., 2022).

3. Sketching Matrix Choices and Embedding Guarantees

The efficacy of sketch-and-project methods depends sensitively on the choice of random sketching matrix SS:

  • Gaussian: SijN(0,1/m)S_{ij} \sim \mathcal{N}(0,1/m) i.i.d.
  • SRHT (Subsampled Randomized Hadamard Transform): S=ΦHD/mS = \Phi H D / \sqrt{m} for HH a Hadamard matrix, DD random signs, and Φ\Phi uniform sampling.
  • Sparse sketches: Clarkson–Woodruff, CountSketch, LESS embeddings (leverage score sampling with O(nlogn)O(n \log n) nonzeros per row) (Dereziński et al., 2022).

A sketch SS is an ε\varepsilon-subspace embedding for AA if

(1ε)Az22SAz22(1+ε)Az22,zRd,(1-\varepsilon)\|A z\|_2^2 \leq \|S A z\|_2^2 \leq (1+\varepsilon)\|A z\|_2^2, \quad \forall z \in \mathbb{R}^d,

or equivalently all eigenvalues of UTSTSUU^T S^T S U are in [1ε,1+ε][1-\varepsilon, 1+\varepsilon].

The minimum sketch size mm to achieve a given distortion and failure probability can be characterized asymptotically using Tracy–Widom fluctuations: Set ε+1μm,dσm,d=F11(1δ),\text{Set } \frac{\varepsilon + 1 - \mu_{m,d}}{\sigma_{m,d}} = F_1^{-1}(1-\delta), where μm,d,σm,d\mu_{m,d}, \sigma_{m,d} are centering and scaling constants and F1F_1 is the Tracy–Widom(1) distribution (Ahfock et al., 2022). For block sizes mdm \gg d, the required mm can be much smaller than classical worst-case bounds.

4. Relationship to Randomized SVD and Low-Rank Approximation

The same sketching/projection operators in iterative algorithms also drive the analysis of randomized SVD and low-rank approximation. For projection operator PSP_S,

ρ(A,k)=λmin(E[PS]),\rho(A, k) = \lambda_{\min}\left(\mathbb{E}[P_S]\right),

and the randomized SVD error

Err(A,k)=APSAF2.\mathrm{Err}(A, k) = \|A - P_S A\|_F^2.

The per-iteration convergence rate for sketch-and-project solvers is tightly lower bounded by

ρ(A,k)kσmin2(A)Err(A,k1),\rho(A, k) \gtrsim \frac{k\,\sigma_{\min}^2(A)}{\mathrm{Err}(A, k-1)},

with super-linear improvement when the spectrum of AA decays rapidly (polynomial or exponential decay) (Dereziński et al., 2022). Sparse sketches (LESS, CountSketch, leverage-score sampling) retain the same rate up to O(1/r)O(1/\sqrt{r}) error where rr is stable rank, even when the schematic density is radically reduced.

5. Asymptotic and Non-Asymptotic Rate Results

Classical non-asymptotic theory (e.g., [Tropp 2011]) gives that an ε\varepsilon-embedding is achieved with m=O(ε2(d+log(1/δ)))m = O\left(\varepsilon^{-2}(d + \log(1/\delta))\right) or, for SRHT, m=O(ε2(d+logn)2log(d/δ))m = O\left(\varepsilon^{-2}(\sqrt{d}+\sqrt{\log n})^2 \log(d/\delta)\right). The Tracy–Widom-based theory delivers accurate, sharp predictions for empirical failure rates and convergence probabilities in the "tall and thin" regime, showing that:

  • Much smaller mm often suffices in practice.
  • Empirical spectral distribution of distortion matches the F1F_1 Tracy–Widom curve closely for d100d \gtrsim 100.
  • Block, SLESS, or sub-Gaussian sketches behave nearly identically when ndn \gg d and leverage-scores of AA are small (Ahfock et al., 2022, Dereziński et al., 2022).

6. Implementation Guidelines and Practical Strategies

Practical implementation proceeds by:

  1. Fixing the target distortion ε\varepsilon and failure probability δ\delta.
  2. Solving for the minimum mm that guarantees the desired rate (using explicit Tracy–Widom or surrogate spectral bounds).
  3. Choosing SS as Gaussian, SRHT, or Clarkson–Woodruff/LESS embedding as appropriate for computational constraints.
  4. For iterative methods, simulating the theoretical rate curves to confirm sharpness and optimize trade-off between per-iteration cost and overall runtime.

Empirical evaluations show that the TW-based selection of mm is within 5% of the empirical optimum, and that sparse sketches (with O(nlogn)O(n \log n) nonzeros per row) offer the same convergence behavior as dense ones for large nn (Ahfock et al., 2022, Dereziński et al., 2022).

Numerical results on large-scale real world datasets (genetic data n4×105,d[100,1000]n \sim 4 \times 10^5, d \in [100, 1000]; iterative least-squares on n=5105,d=90n = 5 \cdot 10^5, d=90) confirm matching of empirical embedding probabilities and theoretical rates for Gaussian, SRHT, and sparse sketches, and the failure of uniform sketching (lacking invariance to the Wishart limit).

7. Extensions, Limitations, and Comparisons

  • The sketch-and-project formalism subsumes a variety of classical and modern randomized iterative methods, including Kaczmarz, block Kaczmarz, coordinate descent, block Newton, random pursuit, and randomized matrix inversion methods via projection onto sketched constraints (Gower et al., 2015, Gower, 2016).
  • For block sizes approaching dd, block and momentum-accelerated variants enable further empirical gains with efficient use of cache and parallel computation.
  • Uniform or leverage-score sketches that do not satisfy Wishart/pivot invariance may deviate from the Tracy–Widom predictions and typically underperform in high-coherence regimes.
  • The theoretical and experimental framework covers both one-shot sketching (randomized SVD, single-pass embedding) and online/iterative sketch-and-project methods, with rigorous non-asymptotic spectral and empirical convergence bounds.

In summary, randomized sketch-and-project methods provide a mathematically sharp, algorithmically versatile architecture for randomized linear algebra and optimization, where the interplay of random matrix theory, sketching constructions, and iterative projective updates yields both deep theoretical guarantees and robust, efficient large-scale solvers (Gower et al., 2015, Ahfock et al., 2022, Dereziński et al., 2022).

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to Randomized Sketch-and-Project Methods.