Subsampled Randomized Hadamard Transform (SRHT)

Updated 11 November 2025

SRHT is a structured random projection method that combines randomized sign-flips, the fast Walsh–Hadamard transform, and uniform subsampling for efficient dimensionality reduction.
It provides nearly optimal subspace embeddings with faster geometric error decay than Gaussian projections, enhancing iterative solver performance.
SRHT achieves condition-number-free computational complexity and scales efficiently to large datasets, making it ideal for high-precision, distributed environments.

The Subsampled Randomized Hadamard Transform (SRHT) is a structured random projection technique that enables efficient dimensionality reduction, fast randomized matrix algorithms, and nearly optimal subspace embeddings for large-scale computational linear algebra and statistical learning. By combining a randomized diagonal sign-flip, a fast Walsh–Hadamard transform, and uniform subsampling, SRHT matches the embedding quality of dense Gaussian projections at a fraction of the computational cost. Recent advances have rigorously analyzed its spectral properties, convergence rates in iterative sketching, explicit polynomial acceleration schemes, and practical deployment in distributed and high-precision environments.

1. Formal Definition and Algorithmic Structure

Given input dimension $n=2^p$ , the SRHT is typically defined by the matrix

$S = B\,H_n\,D\,P$

where:

$H_n \in \mathbb{R}^{n \times n}$ is the normalized Walsh–Hadamard matrix, constructed by recursion:

$H_1 = 1, \quad H_{2k} = \frac{1}{\sqrt{2}} \begin{pmatrix} H_k & H_k\ H_k & -H_k \end{pmatrix}$

$P$ is a random $n \times n$ permutation matrix.
$D = \text{diag}(d_i)$ has i.i.d. Rademacher signs ( $d_i = \pm 1$ ).
$B = \text{diag}(b_i)$ is a diagonal sampling matrix with $b_i \sim \text{Bernoulli}(m/n)$ .

The all-zero rows are discarded, resulting in a final sketch $\widetilde{S} \in \mathbb{R}^{\widetilde{m} \times n}$ with $\widetilde{m} \sim \text{Binomial}(n, m/n)$ and $\widetilde{m}/n \approx m/n$ .

Application of $SA$ for $A \in \mathbb{R}^{n \times d}$ is performed via the fast Walsh–Hadamard transform in $O(nd\log m)$ time. The transform randomizes and spreads the data energy, mitigating coordinate-wise sparsity and enabling near-uniform sampling.

2. Limiting Spectral Distribution and Random Matrix Theory

Let $A = U \Sigma V^\top$ , and consider $C_S = U^\top S^\top S U \in \mathbb{R}^{d \times d}$ . Under the proportional asymptotic regime,

$n \to \infty, \quad \frac{d}{n} \to \gamma \in (0,1), \quad \frac{m}{n} \to \xi \in (\gamma,1)$

the empirical spectral distribution of $C_S$ converges to a deterministic law $F_h$ supported in $(0,1)$ , with density

$f_h(\lambda) = \frac{1}{2\gamma\pi} \frac{\sqrt{(\Lambda_h-\lambda)_+(\lambda-\lambda_h)_+}}{\lambda(1-\lambda)}$

where

$\lambda_h = \left(\sqrt{(1-\gamma)\xi} - \sqrt{(1-\xi)\gamma}\right)^2,\quad \Lambda_h = \left(\sqrt{(1-\gamma)\xi} + \sqrt{(1-\xi)\gamma}\right)^2$

This explicit description enables precise analysis of the preconditioned Hessian $A^\top S^\top S A$ critical to accelerated first-order optimization. The edge eigenvalues control the numerical stability and error contraction.

3. Polynomial Accelerators and Orthogonal Polynomial Recurrences

For accelerated iterative methods, one constructs normalized orthogonal polynomials $R_t(x)$ w.r.t.\ the SRHT spectral measure $\mu(\lambda) = f_h(\lambda)\,d\lambda$ . Key steps:

Define auxiliary parameters $\alpha, \beta, \tau, c$ as above.
The standard orthogonal polynomials $\{\Pi_t(x)\}$ for the Marchenko–Pastur law on $[\alpha, \beta]$ satisfy a three-term recurrence:

$\Pi_0(x)=1,\quad \Pi_1(x)=1-x,\quad \Pi_t(x)=(1+\tau-x)\Pi_{t-1}(x)-\tau\Pi_{t-2}(x)$

The optimal recurrence polynomials for the SRHT-weighted measure are

$R_t(x) = \frac{\Pi_t(\omega(x-c))}{\Pi_t(-\omega c)}$

where $\omega = 4/(\sqrt{\beta-c}+\sqrt{\alpha-c})^2$ . These polynomials realize optimal error decay and are rescaled for use in practical acceleration.

4. Optimal First-Order Method Construction

For least-squares minimization

$\min_x\; \frac{1}{2}\|Ax-b\|^2$

one utilizes preconditioned Heavy-Ball style updates: $\begin{cases} x_1 = x_0 + b_1 H_S^{-1} \nabla f(x_0)\[6pt] x_t = x_{t-1} + b_t H_S^{-1} \nabla f(x_{t-1}) + (1-a_t)(x_{t-2}-x_{t-1}) \end{cases}$ where $H_S = A^\top S^\top S A$ , and $a_t, b_t$ are extracted from the recurrence and normalization of the SRHT polynomial sequence.

The iterates achieve an asymptotic error decay rate determined by the SRHT spectrum,

$\lim_{n\to\infty}\frac{\|A(x_t-x^*)\|^2}{\|A(x_0-x^*)\|^2}=\int \overline{R}_t^2(\lambda^{-1})\,d\mu(\lambda)$

where $\overline{R}_t$ denotes the normalized polynomial. Full formulas for $a_{h,t},b_{h,t}$ are provided as functions of $\Pi_t(-\omega c)$ .

5. Comparative Convergence Analysis: SRHT vs Gaussian Sketches

Classical Gaussian projections yield contraction rate

$\rho_G = \rho = \frac{d}{m}$

under analogous asymptotics. For SRHT,

$\rho_h = \rho\frac{1-\xi}{1-\gamma} = \frac{\gamma}{\xi}\frac{1-\xi}{1-\gamma}<1$

Given $\xi> \gamma$ (i.e., sketch size exceeds data dimension), SRHT delivers strictly faster geometric error decay than Gaussian sketching.

Method	Asymptotic contraction rate
Gaussian	$\rho_G = \gamma/\xi$
SRHT/Haar	$\rho_h = (\gamma/\xi)\cdot\frac{1-\xi}{1-\gamma}$

Equivalently, for a fixed sketch-size ratio $m/n$ , SRHT always beats Gaussian in convergence rate.

6. Computational Complexity, Scaling, and Condition Number Independence

For accuracy $\|A(x_t-x^*)\|^2\le\varepsilon\|A(x_0-x^*)\|^2$ , the iteration count is

$t\approx\frac{\log(1/\varepsilon)}{\log(1/\rho_h)}$

Per iteration cost is $O(nd)$ for matrix-vector products. The one-time factorization $H_S$ costs $O(md^2)$ . Constructing $SA$ costs $O(nd\log m)$ .

Thus,

$\mathcal{C}_\text{SRHT} = O\big(nd\log m + md^2 + nd\cdot \textrm{iteration count}\big)$

For the optimal $m\sim d$ ,

$O\big(nd\log d + d^3 + nd\log (1/\varepsilon)\big)$

Notably, the complexity is independent of $\kappa(A)$ , the condition number. Compared to randomized preconditioned conjugate gradient (PCG), SRHT improves by a factor of $\log d$ in the regime $n\gg d$ .

7. Practical Impact and Implementation Considerations

The SRHT embedding supplies:

Explicit limiting spectral distributions for accurate algorithm design.
Closed-form optimal polynomial accelerators for Heavy-Ball style iterative methods.
Provably faster convergence for sketch-based solvers than Gaussian analogues.
Fast matrix-vector products leveraging the FWHT, scaling to massive datasets.
Full independence from the conditioning of the data matrix, enabling robust randomized solvers.

Empirically, SRHT-based solvers substantially outperform Gaussian-sketch and PCG solvers when computational or memory constraints preclude dense embedding, especially when $n/d$ is large. SRHT should be preferred in large-scale least-squares or when spectral sketch properties impact statistical estimator variance (Lacotte et al., 2020).

8. Contextual Significance and Future Directions

Recent random matrix theory results, particularly the derivation of limiting SRHT spectra and the polynomials that arise from them, have enabled both practical deployment and theoretical guarantees for fast, robust linear solvers. The optimal Heavy-Ball algorithm constructed with SRHT embedding provides condition-number-free complexity, which, up to logarithmic factors, is the best known in contemporary randomized numerical linear algebra (Lacotte et al., 2020). Extensions to block-wise distributed architectures, streaming algorithms, and mixed precision computations (e.g., RHQR with SRHT) further demonstrate its versatility across computational environments.

PDF Markdown Chat (Pro)

References (1)

Optimal Randomized First-Order Methods for Least-Squares Problems (2020)

Whiteboard

Generate a whiteboard explanation of this topic.

Topic to Video (Beta)

Generate a video overview of this topic.

Follow Topic

Get notified by email when new papers are published related to Subsampled Randomized Hadamard Transform (SRHT).