Randomized Sketch-and-Project Methods

Updated 14 November 2025

Randomized sketch-and-project methods are iterative algorithms that use random projections to efficiently solve linear systems, least squares problems, and matrix equations.
They achieve global Q-linear convergence in expectation by unifying techniques from classical methods such as Kaczmarz and coordinate descent with modern random matrix theory.
Selecting appropriate sketching matrices (e.g., Gaussian, SRHT, sparse sketches) is key to balancing computational cost and embedding guarantees for large-scale problems.

Randomized sketch-and-project methods refer to a broad and powerful class of iterative randomized algorithms for solving linear systems, least squares problems, linear feasibility, matrix equations, and related tasks in scientific computing. These algorithms leverage random projections (sketching) to reduce computational and storage complexity, while maintaining precise analytic control of convergence rates and solution quality. Developed and unified in the past decade, they bridge classical row- or coordinate-style methods such as Kaczmarz and coordinate descent, randomized block and Newton strategies, as well as modern subspace projection schemes tied to random matrix theory and optimal embedding guarantees.

1. Formulation and Unified Framework

Given a linear system $A x = b$ with $A \in \mathbb{R}^{m \times n}$ , randomized sketch-and-project methods proceed by iteratively projecting the current iterate $x^k$ onto the solution set of a randomly sketched subsystem. In its most general form, the update takes the form: $x^{k+1} = \arg\min_{x \in \mathbb{R}^n} \|x - x^k\|_W^2 \quad \text{subject to } S^T A x = S^T b,$ where $W \succ 0$ is a user-chosen geometry (weight) matrix, $S \in \mathbb{R}^{m \times q}$ is a random sketching matrix (often $q \ll m$ ), and the norm is $\|z\|_W^2 = z^T W z$ (Gower et al., 2015).

This update admits multiple equivalent interpretations:

Sketch-and-project: Project $x^k$ in the $W$ -norm onto the affine subspace defined by the sketched equations.
Random update: Closed-form as

$x^{k+1} = x^k - W^{-1}A^T S (S^T A W^{-1}A^T S)^\dagger S^T(A x^k - b).$

Random linear solve: Solve a small projected problem.
Randomized intersect and fixed-point forms: The update corresponds to intersection of affine spaces and fixed-point contraction.

Key classical methods arise as special cases:

Randomized Kaczmarz (RK): $W=I$ , $S=e_i$ [coordinate projection].
Randomized coordinate descent (CD): $W=A$ , $S=e_i$ , $A \succ 0$ .
Randomized Newton/block Kaczmarz: Block selection for $S$ .
Gaussian Kaczmarz/pursuit: $S \sim \mathcal{N}(0, I)$ (Gower et al., 2015, Gower, 2016).

For more complex problems such as matrix equations $A X B = C$ , (block) sketch-and-project methods project in a matrix-weighted Frobenius norm onto the solution set of doubly-sketched equations, yielding updates of the form: $X^{k+1} = X^k - Z_1' (X^k - X^*) Z_2,$ where $Z_1', Z_2$ are based on left and right sketches for $A$ and $B$ respectively (Bao et al., 2023).

2. Convergence Analysis and Rate Characterization

The fundamental convergence guarantee is global $Q$ -linear convergence in expectation. Let $Z$ be the expected (random) update matrix, and define: $\rho = 1 - \lambda_{\min}(W^{-1/2} \mathbb{E}[Z] W^{-1/2})$ Under mild rank assumptions, $\rho < 1$ and

$\mathbb{E}\|x^k - x^*\|_W^2 \leq \rho^k \|x^0 - x^*\|_W^2$

(Gower et al., 2015, Gower, 2016). The rate $\rho$ is governed by the worst-case contraction over the distribution of sketches, with block size and importance sampling distributions improving the bound. For the matrix equation $A X B = C$ ,

$\mathbb{E}\|X^{k} - X^*\|_{F(G)}^2 \leq \rho^k \|X^0 - X^*\|_{F(G)}^2, \quad \rho = 1 - \lambda_{\min}(E[Z_2 \otimes Z_1'])$

(Bao et al., 2023).

For iterative sketch-and-project applied to preconditioned least squares, the spectrum of the sketched Gram matrix $U^T S^T S U$ for $A = U D V^T$ plays a pivotal role. Embedding and convergence probabilities can be sharply predicted in the high-dimensional, tall-data limit using tools from random matrix theory, in particular the Tracy–Widom law for the largest eigenvalue of a Wishart matrix (Ahfock et al., 2022).

3. Sketching Matrix Choices and Embedding Guarantees

The efficacy of sketch-and-project methods depends sensitively on the choice of random sketching matrix $S$ :

Gaussian: $S_{ij} \sim \mathcal{N}(0,1/m)$ i.i.d.
SRHT (Subsampled Randomized Hadamard Transform): $S = \Phi H D / \sqrt{m}$ for $H$ a Hadamard matrix, $D$ random signs, and $\Phi$ uniform sampling.
Sparse sketches: Clarkson–Woodruff, CountSketch, LESS embeddings (leverage score sampling with $O(n \log n)$ nonzeros per row) (Dereziński et al., 2022).

A sketch $S$ is an $\varepsilon$ -subspace embedding for $A$ if

$(1-\varepsilon)\|A z\|_2^2 \leq \|S A z\|_2^2 \leq (1+\varepsilon)\|A z\|_2^2, \quad \forall z \in \mathbb{R}^d,$

or equivalently all eigenvalues of $U^T S^T S U$ are in $[1-\varepsilon, 1+\varepsilon]$ .

The minimum sketch size $m$ to achieve a given distortion and failure probability can be characterized asymptotically using Tracy–Widom fluctuations: $\text{Set } \frac{\varepsilon + 1 - \mu_{m,d}}{\sigma_{m,d}} = F_1^{-1}(1-\delta),$ where $\mu_{m,d}, \sigma_{m,d}$ are centering and scaling constants and $F_1$ is the Tracy–Widom(1) distribution (Ahfock et al., 2022). For block sizes $m \gg d$ , the required $m$ can be much smaller than classical worst-case bounds.

4. Relationship to Randomized SVD and Low-Rank Approximation

The same sketching/projection operators in iterative algorithms also drive the analysis of randomized SVD and low-rank approximation. For projection operator $P_S$ ,

$\rho(A, k) = \lambda_{\min}\left(\mathbb{E}[P_S]\right),$

and the randomized SVD error

$\mathrm{Err}(A, k) = \|A - P_S A\|_F^2.$

The per-iteration convergence rate for sketch-and-project solvers is tightly lower bounded by

$\rho(A, k) \gtrsim \frac{k\,\sigma_{\min}^2(A)}{\mathrm{Err}(A, k-1)},$

with super-linear improvement when the spectrum of $A$ decays rapidly (polynomial or exponential decay) (Dereziński et al., 2022). Sparse sketches (LESS, CountSketch, leverage-score sampling) retain the same rate up to $O(1/\sqrt{r})$ error where $r$ is stable rank, even when the schematic density is radically reduced.

5. Asymptotic and Non-Asymptotic Rate Results

Classical non-asymptotic theory (e.g., [Tropp 2011]) gives that an $\varepsilon$ -embedding is achieved with $m = O\left(\varepsilon^{-2}(d + \log(1/\delta))\right)$ or, for SRHT, $m = O\left(\varepsilon^{-2}(\sqrt{d}+\sqrt{\log n})^2 \log(d/\delta)\right)$ . The Tracy–Widom-based theory delivers accurate, sharp predictions for empirical failure rates and convergence probabilities in the "tall and thin" regime, showing that:

Much smaller $m$ often suffices in practice.
Empirical spectral distribution of distortion matches the $F_1$ Tracy–Widom curve closely for $d \gtrsim 100$ .
Block, SLESS, or sub-Gaussian sketches behave nearly identically when $n \gg d$ and leverage-scores of $A$ are small (Ahfock et al., 2022, Dereziński et al., 2022).

6. Implementation Guidelines and Practical Strategies

Practical implementation proceeds by:

Fixing the target distortion $\varepsilon$ and failure probability $\delta$ .
Solving for the minimum $m$ that guarantees the desired rate (using explicit Tracy–Widom or surrogate spectral bounds).
Choosing $S$ as Gaussian, SRHT, or Clarkson–Woodruff/LESS embedding as appropriate for computational constraints.
For iterative methods, simulating the theoretical rate curves to confirm sharpness and optimize trade-off between per-iteration cost and overall runtime.

Empirical evaluations show that the TW-based selection of $m$ is within 5% of the empirical optimum, and that sparse sketches (with $O(n \log n)$ nonzeros per row) offer the same convergence behavior as dense ones for large $n$ (Ahfock et al., 2022, Dereziński et al., 2022).

Numerical results on large-scale real world datasets (genetic data $n \sim 4 \times 10^5, d \in [100, 1000]$ ; iterative least-squares on $n = 5 \cdot 10^5, d=90$ ) confirm matching of empirical embedding probabilities and theoretical rates for Gaussian, SRHT, and sparse sketches, and the failure of uniform sketching (lacking invariance to the Wishart limit).

7. Extensions, Limitations, and Comparisons

The sketch-and-project formalism subsumes a variety of classical and modern randomized iterative methods, including Kaczmarz, block Kaczmarz, coordinate descent, block Newton, random pursuit, and randomized matrix inversion methods via projection onto sketched constraints (Gower et al., 2015, Gower, 2016).
For block sizes approaching $d$ , block and momentum-accelerated variants enable further empirical gains with efficient use of cache and parallel computation.
Uniform or leverage-score sketches that do not satisfy Wishart/pivot invariance may deviate from the Tracy–Widom predictions and typically underperform in high-coherence regimes.
The theoretical and experimental framework covers both one-shot sketching (randomized SVD, single-pass embedding) and online/iterative sketch-and-project methods, with rigorous non-asymptotic spectral and empirical convergence bounds.

In summary, randomized sketch-and-project methods provide a mathematically sharp, algorithmically versatile architecture for randomized linear algebra and optimization, where the interplay of random matrix theory, sketching constructions, and iterative projective updates yields both deep theoretical guarantees and robust, efficient large-scale solvers (Gower et al., 2015, Ahfock et al., 2022, Dereziński et al., 2022).