Least Squares Sketching

Updated 2 March 2026

Least squares sketching is a randomized method that compresses large-scale LS regression using projection techniques to preserve key data geometry.
It leverages dense, structured, and sparse sketching schemes to balance computational efficiency with controlled statistical risk.
Adaptive and iterative sketching methods further enhance accuracy and convergence in regularized, constrained, and tensor-structured problems.

Least squares sketching is the central paradigm in modern randomized numerical linear algebra for reducing the computational complexity and storage demands of large-scale least squares (LS) regression by leveraging random projections or row-sampling mechanisms. It enables the efficient approximation of LS estimators by transforming the original problem to a smaller sketched problem, yielding explicit tradeoffs among statistical efficiency, computational cost, and memory (Dobriban et al., 2018, Lacotte et al., 2021, Raskutti et al., 2014, Woodruff, 2014).

1. Definition and Motivation

Given a tall data matrix $X\in\mathbb{R}^{n\times p}$ ( $n\gg p$ ) and response vector $Y\in\mathbb{R}^{n}$ , the principle of least squares sketching is to compress both $X$ and $Y$ using a carefully chosen random “sketching” matrix $S \in \mathbb{R}^{r \times n}$ with $r \ll n$ , yielding sketched data $\tilde{X}=S X, \tilde{Y}=S Y$ . The compressed regression

$\hat\beta_s = (\tilde{X}^T \tilde{X})^{-1} \tilde{X}^T \tilde{Y} = \arg\min_\beta \|S X \beta - S Y\|_2^2$

approximates the full-data OLS estimator. The rationale is that, for suitable $S$ , the geometry of the column space of $X$ (and the residual structure) is approximately preserved, guaranteeing the sketched solution possesses controlled error relative to the full solution. This reduction transforms the computational bottleneck from $O(n p^2)$ to $O(r p^2)$ for dense data, or lower for sparse matrices (Woodruff, 2014).

2. Randomized Sketching Schemes

The guarantee of high-fidelity approximation depends on the class of randomized sketching matrices, each with distinct embedding properties, flop counts, and storage requirements.

Main Classes:

Dense Gaussian or Sub-Gaussian Sketches: $S_{ij} \sim N(0,1)$ or i.i.d. sub-Gaussian, with the classical Johnson–Lindenstrauss (JL) embedding property. Achieves subspace preservation with $r=O\left(p/\epsilon^2\right)$ (Raskutti et al., 2014, Woodruff, 2014).
Subsampled Randomized Hadamard Transform (SRHT): $S = R H D$ , where $D$ is a random sign, $H$ Hadamard (or similar fast orthogonal transform), and $R$ randomly samples $r$ rows. Allows $O(n \log r)$ sketch application time (Dobriban et al., 2018, Woodruff, 2014).
Sparse Sign Embeddings (CountSketch/Clarkson–Woodruff Sketch): Each column of $S$ has few nonzero $\pm1$ entries; usable in $O(\text{nnz}(X))$ time (Woodruff, 2014).
Leverage-Score Sampling: Rows of $X$ are sampled in proportion to their statistical leverage scores, with or without rescaling (Raskutti et al., 2014).
Tensor-Structured Sketches: For Kronecker-structured $X$ , row-wise or Kronecker-product sketching reduces cost further while maintaining embedding guarantees (Chen et al., 2019, Chen et al., 2020).

The effectiveness of each sketch depends on the embedding dimension, the preservation of subspace geometry, and the computational characteristics of the matrix–vector application.

3. Statistical and Algorithmic Risk Analyses

The performance of least squares sketching is measured in both statistical and algorithmic frameworks.

Risk Measures:

Variance Efficiency (VE): $VE = \frac{\mathbb{E}\|\hat\beta_s - \beta\|^2}{\mathbb{E}\|\hat\beta - \beta\|^2}$ .
Prediction Efficiency (PE): $PE = \frac{\mathbb{E}\|X\hat\beta_s - X\beta\|^2}{\mathbb{E}\|X\hat\beta - X\beta\|^2}$ .
Out-of-Sample Efficiency (OE): $OE = \frac{\mathbb{E}[(x_t^T\hat\beta_s - y_t)^2]}{\mathbb{E}[(x_t^T\hat\beta - y_t)^2]}$ , for a new test point.

Asymptotic Results (Dobriban et al., 2018):

	Gaussian / i.i.d. Sketch	Haar/SRHT Sketch
VE	$1+\frac{1-\gamma}{\xi-\gamma}$	$\frac{1-\gamma}{\xi-\gamma}$
OE	$\frac{\xi-\gamma^2}{\xi-\gamma}$	$\frac{1-\gamma}{1-\gamma/\xi}$

Here, $\gamma = p/n, \xi = r/n$ . In the double-asymptotic regime ( $n, p, r \rightarrow \infty$ ), orthogonal (SRHT/Haar) sketches are strictly superior to i.i.d. Gaussian sketches for fixed $r$ , minimizing the excess statistical risk.

Algorithmic vs. Statistical Theory (Raskutti et al., 2014, Raskutti et al., 2015):

Algorithmic (worst-case): with $r = O(p/\epsilon)$ , one preserves the relative residual within a factor $1+\epsilon$ of the optimal.
Statistical (mean-squared error): to match OLS prediction efficiency (PE $\approx1$ ), one requires $r \gtrsim n$ ; substantial dimensionality reduction (e.g., $r \approx p$ ) controls residual error but not predictive error.

Lower Bounds: For any single-shot sketch, PE cannot be better than $O(n/r)$ ; thus, statistical accuracy for prediction is fundamentally limited by the compression ratio (Raskutti et al., 2014, Raskutti et al., 2015).

4. Advanced Adaptive and Iterative Methods

Recent work has advanced beyond fixed-dimension, single-shot sketching by developing adaptive and iterative methods that exploit the effective or statistical dimension of the problem.

Effective Dimension Adaptive Sketching: The notion of effective dimension ( $d_e = \mathrm{trace}(D)/\|D\|^2$ , where $D$ relates to the regularized spectrum) underpins algorithms where the embedding dimension dynamically increases until “sufficient progress” is detected, guaranteeing that the sketch size matches the intrinsic complexity, not the ambient dimension (Lacotte et al., 2020, Lacotte et al., 2021).
Iterative Hessian Sketch (IHS): Rather than compressing both $X$ and $Y$ , each step sketches only the Hessian, yielding linear (or superlinear) convergence to the true solution using sketches of dimension matching the Gaussian width of the transformed tangent cone (statistical dimension), typically much smaller than $n$ (Pilanci et al., 2014, Lacotte et al., 2021).
Sequential Least-Squares Estimators: Algorithms such as SLSE-FRS combine sequentially increasing sketch sizes with fast iterative solvers, attaining the accuracy of full data at reduced cost (Chen et al., 8 Sep 2025).

These methods achieve nearly optimal statistical and computational guarantees, particularly when the spectrum of $X^T X$ is rapidly decaying.

5. Extensions: Regularized, Constrained, and Structured Problems

Least squares sketching is not limited to unconstrained or unregularized regression.

Tikhonov/Ridge Regularization: Sketch-based preconditioners for regularized problems reduce the number of LSQR iterations to $O(\log(1/\epsilon))$ , and when the statistical dimension is much smaller than the ambient, sketches of size $O(\text{sd}_\lambda(A))$ suffices (Meier et al., 2022).
Regularized Optimization (Convex/Nonconvex): Sketching methods extend to general convex and nonconvex regularizers, with sharp approximation and minimax error bounds even for sparse or low-rank recovery (Yang et al., 2023, Chen et al., 2020).
Constrained Problems and Geometry-Dependent Embeddings: For general convex cones or manifold constraints, the required embedding dimension depends on the tangent cone’s Gaussian width or more refined geometric notions (e.g., $M$ -complexity) (Chen et al., 2020). For rank-constrained LS, recursive importance sketching (RISRO) achieves (super)quadratic convergence by data-dependent, tangent-space-based functional reductions (Luo et al., 2020).
Tensor-Structured Problems: Sketched LS with tensor-product or rowwise Kronecker sketches enables efficient solution of high-dimensional PDE inverse problems where both data and sketch must respect underlying data structure (Chen et al., 2019, Chen et al., 2020).

6. Stability, Mixed Precision, and Practical Implementation

Forward stability and practical computation are crucial for real applications:

Stability: Properly implemented iterative sketching algorithms are provably forward stable for overdetermined LS, attaining accuracy comparable to QR solvers with significantly reduced runtime (Epperly, 2023).
Mixed Precision: Sketch formation and subsequent factorization can be performed in low-precision arithmetic, with controlled error propagation to preconditioners and final solution accuracy via iterative refinement (e.g., GMRES-based IR), especially beneficial for ill-conditioned or large problems (Carson et al., 2024).
Software and Benchmarks: Optimized libraries, such as Sketch 'n Solve, implement multiple sketching operators (dense, sparse, structured), showing up to 50x speedup versus classical LSQR while maintaining accuracy across a variety of real-world and synthetic datasets (Lavaee, 2024).

7. Empirical Verification and Data-Dependent Practicalities

Extensive empirical evaluations corroborate theoretical risk and complexity predictions:

Synthetic Data: Experiments on varying $n, p, r$ (e.g., $n=2,000, p=100$ or $800$) show that empirical VE/PE closely matches closed-form asymptotic formulas (Dobriban et al., 2018, Woodruff, 2014).
Real Datasets: On large-scale problems (e.g., Million-Song, CIFAR-100), SRHT and adaptive PCG/SKETCHING outperform classical and non-adaptive solvers in both speed and memory (Dobriban et al., 2018, Lacotte et al., 2021, Lavaee, 2024).
Structured Data: For PDE and low-rank matrix problems, tensorized and importance-based sketches allow for scalable, accurate recovery (Chen et al., 2019, Chen et al., 2020, Luo et al., 2020).
Sketch Size Selection: The embedding dimension must be chosen in accordance with the desired error type (residual vs. prediction), effective dimension, and available computational resources. Adaptive mechanisms automate this process in modern approaches (Lacotte et al., 2021, Lacotte et al., 2020).

In summary, least squares sketching encompasses a rich family of randomized dimensionality reduction and preconditioning techniques that underlie scalable, statistically controlled regression for both classical and modern large-scale regimes. Theoretical sharpness, statistical–algorithmic tradeoffs, structural adaptations, and practical stability are all essential facets currently addressed by the leading research in this area (Dobriban et al., 2018, Lacotte et al., 2021, Luo et al., 2020, Woodruff, 2014, Epperly, 2023).