Papers
Topics
Authors
Recent
2000 character limit reached

Subsampled Randomized Hadamard Transform (SRHT)

Updated 11 November 2025
  • SRHT is a structured random projection method that combines randomized sign-flips, the fast Walsh–Hadamard transform, and uniform subsampling for efficient dimensionality reduction.
  • It provides nearly optimal subspace embeddings with faster geometric error decay than Gaussian projections, enhancing iterative solver performance.
  • SRHT achieves condition-number-free computational complexity and scales efficiently to large datasets, making it ideal for high-precision, distributed environments.

The Subsampled Randomized Hadamard Transform (SRHT) is a structured random projection technique that enables efficient dimensionality reduction, fast randomized matrix algorithms, and nearly optimal subspace embeddings for large-scale computational linear algebra and statistical learning. By combining a randomized diagonal sign-flip, a fast Walsh–Hadamard transform, and uniform subsampling, SRHT matches the embedding quality of dense Gaussian projections at a fraction of the computational cost. Recent advances have rigorously analyzed its spectral properties, convergence rates in iterative sketching, explicit polynomial acceleration schemes, and practical deployment in distributed and high-precision environments.

1. Formal Definition and Algorithmic Structure

Given input dimension n=2pn=2^p, the SRHT is typically defined by the matrix

S=BHnDPS = B\,H_n\,D\,P

where:

  • HnRn×nH_n \in \mathbb{R}^{n \times n} is the normalized Walsh–Hadamard matrix, constructed by recursion:

H1=1,H2k=12(HkHk HkHk)H_1 = 1, \quad H_{2k} = \frac{1}{\sqrt{2}} \begin{pmatrix} H_k & H_k\ H_k & -H_k \end{pmatrix}

  • PP is a random n×nn \times n permutation matrix.
  • D=diag(di)D = \text{diag}(d_i) has i.i.d. Rademacher signs (di=±1d_i = \pm 1).
  • B=diag(bi)B = \text{diag}(b_i) is a diagonal sampling matrix with biBernoulli(m/n)b_i \sim \text{Bernoulli}(m/n).

The all-zero rows are discarded, resulting in a final sketch S~Rm~×n\widetilde{S} \in \mathbb{R}^{\widetilde{m} \times n} with m~Binomial(n,m/n)\widetilde{m} \sim \text{Binomial}(n, m/n) and m~/nm/n\widetilde{m}/n \approx m/n.

Application of SASA for ARn×dA \in \mathbb{R}^{n \times d} is performed via the fast Walsh–Hadamard transform in O(ndlogm)O(nd\log m) time. The transform randomizes and spreads the data energy, mitigating coordinate-wise sparsity and enabling near-uniform sampling.

2. Limiting Spectral Distribution and Random Matrix Theory

Let A=UΣVA = U \Sigma V^\top, and consider CS=USSURd×dC_S = U^\top S^\top S U \in \mathbb{R}^{d \times d}. Under the proportional asymptotic regime,

n,dnγ(0,1),mnξ(γ,1)n \to \infty, \quad \frac{d}{n} \to \gamma \in (0,1), \quad \frac{m}{n} \to \xi \in (\gamma,1)

the empirical spectral distribution of CSC_S converges to a deterministic law FhF_h supported in (0,1)(0,1), with density

fh(λ)=12γπ(Λhλ)+(λλh)+λ(1λ)f_h(\lambda) = \frac{1}{2\gamma\pi} \frac{\sqrt{(\Lambda_h-\lambda)_+(\lambda-\lambda_h)_+}}{\lambda(1-\lambda)}

where

λh=((1γ)ξ(1ξ)γ)2,Λh=((1γ)ξ+(1ξ)γ)2\lambda_h = \left(\sqrt{(1-\gamma)\xi} - \sqrt{(1-\xi)\gamma}\right)^2,\quad \Lambda_h = \left(\sqrt{(1-\gamma)\xi} + \sqrt{(1-\xi)\gamma}\right)^2

This explicit description enables precise analysis of the preconditioned Hessian ASSAA^\top S^\top S A critical to accelerated first-order optimization. The edge eigenvalues control the numerical stability and error contraction.

3. Polynomial Accelerators and Orthogonal Polynomial Recurrences

For accelerated iterative methods, one constructs normalized orthogonal polynomials Rt(x)R_t(x) w.r.t.\ the SRHT spectral measure μ(λ)=fh(λ)dλ\mu(\lambda) = f_h(\lambda)\,d\lambda. Key steps:

  • Define auxiliary parameters α,β,τ,c\alpha, \beta, \tau, c as above.
  • The standard orthogonal polynomials {Πt(x)}\{\Pi_t(x)\} for the Marchenko–Pastur law on [α,β][\alpha, \beta] satisfy a three-term recurrence:

Π0(x)=1,Π1(x)=1x,Πt(x)=(1+τx)Πt1(x)τΠt2(x)\Pi_0(x)=1,\quad \Pi_1(x)=1-x,\quad \Pi_t(x)=(1+\tau-x)\Pi_{t-1}(x)-\tau\Pi_{t-2}(x)

  • The optimal recurrence polynomials for the SRHT-weighted measure are

Rt(x)=Πt(ω(xc))Πt(ωc)R_t(x) = \frac{\Pi_t(\omega(x-c))}{\Pi_t(-\omega c)}

where ω=4/(βc+αc)2\omega = 4/(\sqrt{\beta-c}+\sqrt{\alpha-c})^2. These polynomials realize optimal error decay and are rescaled for use in practical acceleration.

4. Optimal First-Order Method Construction

For least-squares minimization

minx  12Axb2\min_x\; \frac{1}{2}\|Ax-b\|^2

one utilizes preconditioned Heavy-Ball style updates: $\begin{cases} x_1 = x_0 + b_1 H_S^{-1} \nabla f(x_0)\[6pt] x_t = x_{t-1} + b_t H_S^{-1} \nabla f(x_{t-1}) + (1-a_t)(x_{t-2}-x_{t-1}) \end{cases}$ where HS=ASSAH_S = A^\top S^\top S A, and at,bta_t, b_t are extracted from the recurrence and normalization of the SRHT polynomial sequence.

The iterates achieve an asymptotic error decay rate determined by the SRHT spectrum,

limnA(xtx)2A(x0x)2=Rt2(λ1)dμ(λ)\lim_{n\to\infty}\frac{\|A(x_t-x^*)\|^2}{\|A(x_0-x^*)\|^2}=\int \overline{R}_t^2(\lambda^{-1})\,d\mu(\lambda)

where Rt\overline{R}_t denotes the normalized polynomial. Full formulas for ah,t,bh,ta_{h,t},b_{h,t} are provided as functions of Πt(ωc)\Pi_t(-\omega c).

5. Comparative Convergence Analysis: SRHT vs Gaussian Sketches

Classical Gaussian projections yield contraction rate

ρG=ρ=dm\rho_G = \rho = \frac{d}{m}

under analogous asymptotics. For SRHT,

ρh=ρ1ξ1γ=γξ1ξ1γ<1\rho_h = \rho\frac{1-\xi}{1-\gamma} = \frac{\gamma}{\xi}\frac{1-\xi}{1-\gamma}<1

Given ξ>γ\xi> \gamma (i.e., sketch size exceeds data dimension), SRHT delivers strictly faster geometric error decay than Gaussian sketching.

Method Asymptotic contraction rate
Gaussian ρG=γ/ξ\rho_G = \gamma/\xi
SRHT/Haar ρh=(γ/ξ)1ξ1γ\rho_h = (\gamma/\xi)\cdot\frac{1-\xi}{1-\gamma}

Equivalently, for a fixed sketch-size ratio m/nm/n, SRHT always beats Gaussian in convergence rate.

6. Computational Complexity, Scaling, and Condition Number Independence

For accuracy A(xtx)2εA(x0x)2\|A(x_t-x^*)\|^2\le\varepsilon\|A(x_0-x^*)\|^2, the iteration count is

tlog(1/ε)log(1/ρh)t\approx\frac{\log(1/\varepsilon)}{\log(1/\rho_h)}

Per iteration cost is O(nd)O(nd) for matrix-vector products. The one-time factorization HSH_S costs O(md2)O(md^2). Constructing SASA costs O(ndlogm)O(nd\log m).

Thus,

CSRHT=O(ndlogm+md2+nditeration count)\mathcal{C}_\text{SRHT} = O\big(nd\log m + md^2 + nd\cdot \textrm{iteration count}\big)

For the optimal mdm\sim d,

O(ndlogd+d3+ndlog(1/ε))O\big(nd\log d + d^3 + nd\log (1/\varepsilon)\big)

Notably, the complexity is independent of κ(A)\kappa(A), the condition number. Compared to randomized preconditioned conjugate gradient (PCG), SRHT improves by a factor of logd\log d in the regime ndn\gg d.

7. Practical Impact and Implementation Considerations

The SRHT embedding supplies:

  • Explicit limiting spectral distributions for accurate algorithm design.
  • Closed-form optimal polynomial accelerators for Heavy-Ball style iterative methods.
  • Provably faster convergence for sketch-based solvers than Gaussian analogues.
  • Fast matrix-vector products leveraging the FWHT, scaling to massive datasets.
  • Full independence from the conditioning of the data matrix, enabling robust randomized solvers.

Empirically, SRHT-based solvers substantially outperform Gaussian-sketch and PCG solvers when computational or memory constraints preclude dense embedding, especially when n/dn/d is large. SRHT should be preferred in large-scale least-squares or when spectral sketch properties impact statistical estimator variance (Lacotte et al., 2020).

8. Contextual Significance and Future Directions

Recent random matrix theory results, particularly the derivation of limiting SRHT spectra and the polynomials that arise from them, have enabled both practical deployment and theoretical guarantees for fast, robust linear solvers. The optimal Heavy-Ball algorithm constructed with SRHT embedding provides condition-number-free complexity, which, up to logarithmic factors, is the best known in contemporary randomized numerical linear algebra (Lacotte et al., 2020). Extensions to block-wise distributed architectures, streaming algorithms, and mixed precision computations (e.g., RHQR with SRHT) further demonstrate its versatility across computational environments.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to Subsampled Randomized Hadamard Transform (SRHT).