Sequential LS Estimators with Fast Sketching

Updated 10 September 2025

The paper introduces SLSE-FRS, a framework that uses sequential refinement of randomized sketches to efficiently estimate high-dimensional linear models.
It integrates sketch-and-solve with iterative-sketching methods to progressively increase sketch sizes and rapidly approach OLS-level prediction accuracy.
The method provides theoretical guarantees on convergence and relative error while significantly reducing computational costs compared to traditional full-data solvers.

Sequential Least-Squares Estimators with Fast Randomized Sketching (SLSE-FRS) constitute a unified, algorithmic-statistical framework for the efficient estimation of high-dimensional linear statistical models. The SLSE-FRS approach is designed to dramatically accelerate least-squares estimation for very large data matrices by integrating both Sketch-and-Solve and Iterative-Sketching methods. Its core methodological innovation is a staged, sequential refinement strategy: it constructs and solves a chain of sketched least-squares subproblems with progressively increasing sketch sizes, thus yielding estimators that attain (and provably match) the statistical accuracy of the optimal ordinary least-squares (OLS) solution at a fraction of the computational cost (Chen et al., 8 Sep 2025).

1. Foundations: Sketch-and-Solve and Iterative-Sketching

SLSE-FRS is architected by synthesizing two dominant paradigms for randomized least-squares approximation:

Sketch-and-Solve: A single random sketching matrix $S \in \mathbb{R}^{m \times N}$ (with $m \ll N$ ) is applied to the data $(X, Y)$ , producing a reduced system

$\min_{\beta} \frac{1}{2} \|S Y - S X \beta\|^2$

whose solution,

$\tilde{\beta} = (X^T S^T S X)^{-1} X^T S^T S Y,$

delivers a near-optimal estimator with relative-error guarantees and computational complexity $O(N d \log d)$ when using fast transforms such as the Subsampled Randomized Hadamard Transform (SRHT) (0710.1435).

Iterative-Sketching: An iterative refinement scheme (notably, the Iterative Hessian Sketch (Pilanci et al., 2014)) successively applies fresh random sketches at each step to the gradient or Hessian, typically yielding updates of the form

$\beta_{t+1} = \beta_t - H_t^{-1} \nabla f(\beta_t; X, Y), \qquad H_t := X^T S_t^T S_t X$

and contracts the solution error at a geometric rate determined by the subspace embedding property of the sketch.

SLSE-FRS bridges these by sequentially increasing the sketch size $m_i$ , where each new sketched subproblem is solved via a strongly preconditioned, momentum-accelerated iterative method (e.g., M-IHS). The output of each subproblem seeds the next, enabling consistent accuracy improvement. This design circumvents the need for a large, memory-intensive single sketch and, unlike pure iterative-sketching, sidesteps slow solution quality at small sketch sizes.

At the heart of SLSE-FRS is a two-stage procedure:

Inner Stage (Sequential Sketches and Warm Start): $K$ sketched subproblems of the form

$\min_{\beta} \frac{1}{2}\|S_i X \beta - S_i Y\|^2$

are solved, with sketch size $m_i$ progressively increased as $m_{i+1}/m_i = \rho$ for fixed ratio $\rho>1$ . The solution $\beta^i$ from subproblem $i$ is used as the warm-start for subproblem $i+1$ (where $i = 1,\ldots,K$ ).

Each subproblem is addressed using an iterative method:

$\beta_{t+1}^i = \beta_t^i - \mu\hat{H}^{-1}(S_i X)^T(S_i X \beta_t^i - S_i Y) + \eta(\beta_t^i - \beta_{t-1}^i),$

where $\hat{H}=X^T\hat{S}^T\hat{S}X$ is a fixed Hessian-type sketch-based preconditioner (often with SRHT), and $(\mu, \eta)$ are step-size/momentum parameters ensuring geometric convergence.

Outer Stage (Full Data Iterative Refinement): Once the sequence has achieved suboptimality matched to the best attainable error for the final sketch size, additional outer iterations operate on the full dataset with the same preconditioned iterative method. These further reduce error to the “noise level,” i.e., the estimation accuracy of the OLS solution.

The convergence analysis shows that, for each subproblem, the expected prediction error contracts by a constant factor (at least $1/3$) per iteration: $\mathbb{E}\|X(\beta_{a_i}^i-\beta)\| \leq (1/3)^{a_i}\mathbb{E}\|X(\beta_0^i-\beta)\| + \left[1+(1/3)^{a_i}\right]\delta_i,$ where $\delta_i$ is the theoretical best error attainable for sketch size $m_i$ . Accumulating over $T$ total iterations,

$\mathbb{E}\|X(\beta_T-\beta)\| \leq (1/3)^{T}\mathbb{E}\|X(\beta_0-\beta)\| + [1+o(1)]\mathbb{E}\|X(\hat{\beta}-\beta)\|,$

achieving OLS-level accuracy with high probability.

3. Computational Complexity and Implementation

By leveraging structure in both the data and the sketch, SLSE-FRS achieves favorable computational complexity:

Sketching Step: For data matrix $X\in\mathbb{R}^{N \times d}$ , SRHT sketches can be computed in $O(N d \log N)$ operations.
Subproblem Solution: Each sketched subproblem reduces to $O(m_i d^2)$ flops (with $m_i\ll N$ ), while subsequent iterations in the sequence mostly work at smaller sketch sizes.
Total Cost: The total dominant cost is $O(N d \log_2 N)$ for sketch generation, with the majority of iterations at lower computational cost than standard full-data iterative solvers. For applications with $N$ up to $2^{20}$ and moderate $d$ , this translates into substantial running time reductions compared to state-of-the-art methods (Chen et al., 8 Sep 2025).

The implementation applies efficient matrix-multiply routines enabled by fast transforms (SRHT or CountSketch), and solves subproblems via block-iterative solvers (typically M-IHS with tuned step/momentum parameters, e.g., $|\mu-1|\leq 1/4, \eta=53/36-\sqrt{17}/3$ ).

4. Statistical Efficiency and Convergence Properties

SLSE-FRS retains the statistical optimality guarantees associated with OLS estimators:

Relative-Error Guarantees: For suitable sketch sizes ( $m_i \propto d/\epsilon$ for error $\epsilon$ ), each subproblem preserves the subspace embedding property, ensuring that the residual

$\|X \tilde{\beta}^i - Y\| \leq (1+\epsilon) \mathcal{Z},$

where $\mathcal{Z}$ is the optimal least-squares error (0710.1435, Pilanci et al., 2014).

Prediction Efficiency: The sequential increase in sketch size ensures that the estimator approaches a regime in which the statistical prediction error (usually requiring $m_i$ approaching $N$ for constant error if only a single sketch is used (Raskutti et al., 2014, Raskutti et al., 2015)) is minimized iteratively, thereby matching OLS prediction accuracy over the course of refinement.
Noise-Level Convergence: The final outer iterations guarantee that the estimator's mean-squared error contracts to the noise floor, i.e., $\sigma^2 d$ , as for the OLS estimator, even when the number of data points $N$ far exceeds $d$ .

5. Comparison with Contemporary Methods

SLSE-FRS is systematically compared with Preconditioned Conjugate Gradient (PCG) and Iterative Double Sketching (IDS) (Lacotte et al., 2019, Lacotte et al., 2020). Key findings highlighted in the evaluation:

Speed: SLSE-FRS is empirically shown to be approximately twice as fast as IDS and about three times faster than PCG for representative problem sizes ( $N$ up to $2^{20}$ , moderate $d$ ).
Convergence Path: In low-dimensional illustrative settings, SLSE-FRS exhibits more stable and concentrated iteration trajectories compared to IDS.
Precision: Both the achieved residual and prediction error of SLSE-FRS match the theoretical benchmark set by OLS, with empirical error converging to $\sigma^2 d$ .
Efficiency: Choice of sketching operator (SRHT or CountSketch) permits further trade-offs; using CountSketch for sketching can reduce initialization cost with negligible effect on convergence or statistical efficiency.

Method	Computational Cost	Iteration Count	Final Prediction Error
SLSE-FRS	$O(Nd\log_2 N)$ + $O(Nd)$	Few (due to geometric contraction)	$\sigma^2 d$ (noise level)
IDS	Higher (multiple large sketches)	More	Matches OLS (with more iterations)
PCG	Highest (full Gram matrix ops)	More	Matches OLS

6. Extensions: Regularization, Streaming, and Distributed Variants

SLSE-FRS can be adapted and extended in multiple directions:

Regularized Least Squares: By incorporating regularization (e.g., Tikhonov or $\ell_1$ -based), the framework is compatible with sketching schemes designed for regularized objectives, allowing minimax-optimal rates for sparse estimation (Yang et al., 2023).
Statistical Inference and Bootstrap: Fast randomized sketching enables valid statistical inference (e.g., confidence intervals, hypothesis tests) based on the asymptotic normality of quadratic forms in the sketched estimator (Wang et al., 1 Apr 2024).
Distributed and Streaming Scenarios: The modular, low-memory nature of sequential sketching is particularly suitable for distributed environments (e.g., federated learning), using sketch averaging for bias reduction and strong error control (Garg et al., 8 May 2024).
Tensor-structured and High-dimensional Data: Extensions to tensor-structured sketches preserve fast update properties and error bounds in multilinear least-squares and low-rank decompositions (Chen et al., 2020, Ma et al., 2021).

7. Outlook and Limitations

SLSE-FRS represents a new optimal trade-off frontier for large-scale least-squares estimation under resource constraints. It unifies the theoretical guarantees of randomized sketching with practical high-throughput iterative solvers, overcoming known limitations of single-sketch prediction inefficiency and avoiding the high computational demand of full-scale iterative or direct methods.

Noted limitations include the need for appropriate tuning of sketch size progression and iteration counts, as well as the assumption that the subspace embedding guarantees of the chosen sketch are maintained at each stage. For applications with extreme ill-conditioning or nonstandard data distributions, additional stabilization (e.g., adaptive preconditioners (Chen et al., 24 Sep 2024)) or subproblem regularization may be required.

SLSE-FRS thus provides an extensible blueprint for scalable, high-precision linear estimation in modern data analysis, bridging algorithmic and statistical efficiencies with practical implementation and application flexibility.