Papers
Topics
Authors
Recent
Search
2000 character limit reached

Randomized Sketching in Linear Algebra

Updated 6 February 2026
  • Randomized sketching is a dimensionality reduction technique that compresses high-dimensional matrices while preserving essential subspace properties.
  • It employs various methods like Gaussian, SRHT, and CountSketch to enable efficient algorithms for regression, low-rank approximation, matrix decompositions, tensor factorization, and graph analysis.
  • The approach offers rigorous theoretical guarantees, statistical inference tools, and structure-aware extensions that reduce computational and storage costs in large-scale numerical problems.

Randomized sketching is a paradigm for dimensionality reduction in large-scale numerical linear algebra and data science, employing random projections or subsampling techniques to compress high-dimensional data matrices while preserving their essential subspace or operator geometry. Sketching enables efficient algorithms for regression, low-rank approximation, matrix decompositions, linear systems, tensor factorization, and graph analysis by substantially reducing computational and storage costs. This entry reviews the principal constructions, theoretical guarantees, methodologies, and domain-specific applications of randomized sketching, emphasizing recent results on convergence, statistical properties, computational complexity, and structure-aware extensions.

1. Core Sketching Constructions and Subspace Embeddings

A sketching matrix SRm×nS \in \mathbb{R}^{m \times n} with mnm \ll n is applied to a tall data matrix ARn×dA \in \mathbb{R}^{n \times d}, yielding a compressed version SARm×dSA \in \mathbb{R}^{m \times d}. The sketch SS is designed to act as a (1±ε)(1 \pm \varepsilon) 2\ell_2-subspace embedding for the kk-dimensional subspace UU (with k=rank(A)k = \operatorname{rank}(A)), satisfying for all xRkx \in \mathbb{R}^{k}: (1ε) Ax22SAx22(1+ε) Ax22.(1-\varepsilon)\ \|A x\|_2^2 \leq \|S A x\|_2^2 \leq (1+\varepsilon)\ \|A x\|_2^2.

Major sketching matrix constructions and their typical embedding dimensions mm include:

Sketch Type Construction Embedding Dimension
Gaussian/Sub-Gaussian SijN(0,1/m)S_{ij} \sim N(0,1/m) O(k/ε2)O(k/\varepsilon^2)
SRHT/SRFT Subsampled Hadamard/Fourier with random sign-flip O(klogk/ε2)O(k \log k / \varepsilon^2)
CountSketch Each column: single nonzero ±1\pm 1 at a random location O(k2/ε2)O(k^2/\varepsilon^2)
Sparse Expander Adjacency of random (k,δ)-expander, left degree s=O(logk/ε)s=O(\log k / \varepsilon) O(klogk/ε2)O(k \log k/\varepsilon^2)
Leverage-Score Sampling Row sampling by leverage scores, with reweighting O(klogk/ε2)O(k \log k / \varepsilon^2)

Sparse sketches (e.g., magical graph, expander, or CountSketch) achieve input-sparsity time for large, potentially sparse data sets and admit either combinatorial or low-randomness constructions (Hu et al., 2021).

2. Theoretical Guarantees and Statistical Properties

The performance of sketching algorithms is characterized by subspace-embedding properties, matrix concentration inequalities, and random matrix theory (RMT). For example, the Tracy–Widom law governs the fluctuations of the extreme singular values of a randomly projected subspace, enabling precise predictions of embedding success probabilities in the high-dimensional regime (Ahfock et al., 2022). For Gaussian SS and AA of size n×dn \times d with ndn \gg d and kk rows in SS, the largest singular value of SUS U concentrates at (1+d/k)2(1 + \sqrt{d/k})^2 with O(k2/3)O(k^{-2/3}) Tracy–Widom fluctuations, yielding accurate estimates for the probability that SS is an ε\varepsilon-embedding.

From a statistical learning perspective, two risk measures are central in least-squares regression (1406.59861505.06659):

  • Prediction Efficiency (PE): Ratio of MSE in fitting X(β^sketchβ)X(\hat\beta_\mathrm{sketch} - \beta) for the sketched estimator to the OLS estimator. Typically $1 + O(n/r)$, requiring rnr \approx n for PE near unity.
  • Residual Efficiency (RE): Ratio of expected residual error. Scales as $1 + O(p/r)$, requiring only rpr \approx p for RE near unity.

A worst-case (algorithmic) guarantee for residual fitting requires r=O(p/ε)r = O(p/\varepsilon) to ensure a (1+ε)(1+\varepsilon) bound for all YY, regardless of statistical assumptions.

Lower bounds confirm that predictions cannot be preserved with rnr \ll n: no single sketching scheme can avoid a $1 + O(n/r)$ inflation in prediction error (1406.59861505.06659).

3. Algorithmic Methodologies Across Domains

Randomized sketching is integrated at multiple algorithmic levels:

(a) Ordinary Least Squares (OLS)

Sketch-and-solve: compute β^=argminβSYSXβ22\hat\beta = \arg\min_\beta \|SY - SX\beta\|_2^2. Fast algorithms leverage subspace embedding properties to guarantee residual accuracy with rpr \approx p, but require iterative or preconditioned methods (e.g., iterative Hessian sketch) for accurate parameter estimation with rnr \ll n (Chen et al., 8 Sep 2025).

(b) Iterative and Preconditioned Solvers

Preconditioned Richardson iteration and iterative Hessian sketch use sketches to form computationally efficient preconditioners, yielding linear (geometric) convergence rates. For example, sketch-based RZF beamforming in massive MIMO is solved via a preconditioned Richardson iteration, with sketch-preconditioner error contracting as εt\varepsilon^t per iteration (Choi et al., 2019).

(c) Krylov and Matrix Function Methods

Randomized sketching is applied to Krylov subspace methods for matrix functions:

  • Sketched FOM/GMRES: impose (minimal-residual or Galerkin) conditions in the sketched norm, reducing orthogonalization and storage costs from O(Nm2)O(Nm^2) to O(sm2)O(sm^2) while preserving convergence up to a (1±ε)(1 \pm \varepsilon) factor (Güttel et al., 2022Burke et al., 2023Burke et al., 2023).
  • Recycling and Deflation: Compress augmented Krylov or recycle spaces via sketched harmonic Ritz decompositions (Burke et al., 2023).

(d) High-Dimensional Tensors

Randomized sketching efficiently computes low-rank decompositions of tensors:

  • Single-mode sketching in Tucker decomposition avoids tall fat sketches by sketching only along the small mode dimension, leading to substantial memory and runtime gains (Hashemi et al., 2023).
  • Tensor ring decomposition is accelerated via Kronecker-subsampled randomized Fourier transforms or TensorSketch, exploiting multilinear structure for per-iteration costs O(NmIR2)O(N m I R^2) (Yu et al., 2022).

(e) Graph and Community Detection

Randomized sketching compresses large graphs via node or edge sampling. Degree-based or spatially-aware sampling strategies preserve community recoverability in stochastic block models with a fraction of the data, yielding provable recovery with complexity independent of the ambient graph size in the sparse regime (Rahmani et al., 2018).

(f) Nonlinear and Black-Box Approximation

Randomized sketching is paired with rational approximation frameworks (e.g., AAA) for large-scale nonlinear operator surrogates, where high-dimensional function evaluations are compressed via random projections onto lower-dimensional probe spaces, yielding surrogates with pointwise approximation guarantees and accelerated computation (Güttel et al., 2022).

4. Structured and Domain-Aware Sketch Variants

Advanced sketching methods exploit and preserve problem-structure:

  • Leverage-score sampling: Adapts sampling probabilities to the statistical leverage of data rows for sharper dimension reduction in matrices with non-uniform row energy (Raskutti et al., 2015).
  • Higher-order/tensor sketches: E.g., Higher-order Count Sketch (HCS) exponentially reduces hash-function storage and enables direct tensor contraction approximations (Shi et al., 2019).
  • Graph-based sparse sketches: Sparse graph-based embeddings (expanders, magical graphs) attain subspace embedding error comparable to dense maps with O(1)O(1) or O(logk)O(\log k) nonzeros per column, and can be constructed with reduced randomness via error-correcting codes (Hu et al., 2021).
  • Source sketching in PDE-constrained problems: Source dimensions in PDE-constrained optimization are projected via random projections into a smaller basis, drastically reducing the number of required PDE solves while controlling cross-talk noise through regularization (Aghazade et al., 2021).

5. Computational Complexity, Scalability, and Practical Recommendations

Sketching leads to substantial computational and memory savings. Key scaling relations include:

  • Forming the sketch: For dense AA, Gaussian/SRHT approaches cost O(ndm)O(n d m) or O(ndlogm)O(n d \log m); for sparse AA, input-sparsity time O(nnz(A))O(\operatorname{nnz}(A)) is achievable with, e.g., sparse or graph-based sketches (Hu et al., 2021).
  • Solving sketched problems: For least-squares, O(md2)O(m d^2) after sketching, versus O(nd2)O(n d^2) for the full problem; for tensors, costs scale with sketch-size and core ranks rather than the full ambient dimensions (Hashemi et al., 2023Yu et al., 2022).
  • Numerical stability and implementation: Randomized sketching enhances stability of block orthogonalization (e.g., RandCholQR in s-step GMRES), tolerating higher block condition numbers and delivering O(ε)O(\varepsilon) orthogonality at negligible added cost (Yamazaki et al., 20 Mar 2025).

Practical recommendations include using projection-based or structure-aware sketches for generality, exploiting domain structure via leverage scores or tensor/multimodal-aware variants, and performing empirical calibration of sketch size using RMT-derived formulas (e.g., distributions of the maximum Wishart eigenvalue) for optimal embedding success (Ahfock et al., 2022). For statistical reliability, sketch sizes should match desired error criteria (residual fitting: r=O(p)r=O(p); parameter estimation: r=O(n)r=O(n)).

6. Statistical Inference, Uncertainty Quantification, and Limit Laws

Randomized sketching introduces algorithmic randomness that must be accounted for in statistical inference. Recent frameworks provide:

  • Limiting distributions: Central limit theorems for sketched estimators in high-dimensional OLS identify specific forms of bias and variance inflation, with explicit expressions for both i.i.d. and structured sketches (e.g., SRHT) (Zhang et al., 2023).
  • Confidence intervals: Multiple inference methodologies—sub-randomization, multi-run plug-in, and aggregation—enable valid confidence sets for linear contrasts under sketching, calibrated via simulation or plug-in variance estimation at negligible computational overhead (Zhang et al., 2023).
  • Bias correction: Partial sketches require post-hoc scaling to remove first-order bias, and variance inflation is explicitly quantifiable as a function of m,pm,p (e.g., m/(mp)\sqrt{m/(m-p)} for complete sketching) (Zhang et al., 2023).
  • Cost and parallelism: Multi-run and sub-randomization methods can be parallelized, with total overhead dominated by O(nplogn+mp2)O(np \log n + mp^2) for one sketch; communication-efficient batching reduces data movement (Zhang et al., 2023).

These advances provide statistical practitioners with rigorous tools for quantifying uncertainty induced by sketching in computational pipelines.

7. Emerging Directions and Open Problems

Research in randomized sketching continues to expand, with current frontiers including:

  • Tighter asymptotics: Tracy–Widom limit theory for sketching sharpens predictions for embedding quality beyond worst-case ε\varepsilon-bounds (Ahfock et al., 2022).
  • Extensions to non-linear models: Ongoing investigation extends the statistical vs. algorithmic trade-offs to generalized linear models, robust regression, and non-homoscedastic error distributions (Raskutti et al., 2014).
  • Integration with hardware accelerators: GPU-optimized implementations of sketched Krylov, block orthogonalization, and tensor operations deliver high performance at large scale (Yamazaki et al., 20 Mar 2025).
  • Inference beyond OLS: Recent methods generalize uncertainty quantification to broad classes of randomized algorithms, not limited to linear regression (Zhang et al., 2023).
  • Structure-exploiting sketches: Advances in sparse, graph-based, ECC-derandomized, and higher-order sketches enable efficient specialized algorithms for graphs, tensors, and large-scale networks (Hu et al., 2021Shi et al., 2019).
  • Provable guarantees in new domains: Applications span massive MIMO beamforming (Choi et al., 2019), PDE-constrained inverse problems (Aghazade et al., 2021), stochastic block model community detection (Rahmani et al., 2018), nonlinear eigenproblems (Güttel et al., 2022), and others, each motivating domain-adapted extensions of the core randomized sketching principles.

The field thus constitutes a vibrant and foundational area of randomized numerical linear algebra, with active theoretical development and broadening practical impact.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (17)

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Randomized Sketching.