Papers
Topics
Authors
Recent
Search
2000 character limit reached

TensorSketch: Efficient High-Dimensional Mapping

Updated 23 February 2026
  • TensorSketch is a structured random projection technique that efficiently approximates high-dimensional polynomial kernels and tensor contractions using CountSketch and FFT.
  • It leverages a convolution strategy and FFT to achieve subspace embeddings with strong Johnson–Lindenstrauss guarantees and controlled variance.
  • The method is applied in scalable kernel learning, tensor decompositions, and randomized matrix multiplication, offering practical benefits in regression and latent variable models.

TensorSketch is a structured random projection technique designed for efficient approximation of high-dimensional polynomial kernels, tensor contractions, and large Kronecker-structured linear operators. By fusing CountSketch with the convolution theorem and fast Fourier transforms (FFT), TensorSketch enables fast, low-memory oblivious subspace embeddings for tensor products, Kronecker products, and structured matrix multiplications that arise in kernel learning, tensor decompositions, and large-scale regression problems (Pham et al., 13 May 2025, Ahle et al., 2019, Diao et al., 2017, Wang et al., 2015, Yu et al., 2022).

1. Core Construction and Algorithmic Principles

TensorSketch generalizes CountSketch from vectors to tensor products by defining randomized hash-and-sign sketches over multi-indices. Given xRdx \in \mathbb{R}^d and a fixed integer p1p \geq 1, the homogeneous polynomial kernel k(x,y)=x,ypk(x,y) = \langle x, y \rangle^p can be approximated by embedding xx into RD\mathbb{R}^D via a data-independent random map φ(x)\varphi(x), such that

E[φ(x),φ(y)]=x,yp,\mathbb{E}[\langle \varphi(x), \varphi(y) \rangle] = \langle x, y \rangle^p,

with variance decaying as DD grows (Pham et al., 13 May 2025).

The construction involves:

  • Sampling pp pairs of independent hash hj:[d][D]h_j: [d] \to [D] (typically 3-wise independent) and sign sj:[d]{±1}s_j: [d] \to \{\pm 1\} (typically 4-wise independent) functions.
  • Defining CountSketches cj=Chj,sj(x)c_j = C_{h_j,s_j}(x) for j=1,,pj=1, \ldots, p.
  • Computing FFTs: c^j=FFT(cj)\widehat{c}_j = \mathrm{FFT}(c_j).
  • Taking the elementwise (Hadamard) product: c^=c^1c^p\widehat{c} = \widehat{c}_1 \circ \cdots \circ \widehat{c}_p.
  • Returning φ(x)=IDFT(c^)\varphi(x) = \mathrm{IDFT}(\widehat{c}).

For general tensors TRn1××nqT \in \mathbb{R}^{n_1 \times \cdots \times n_q} or Kronecker products, TensorSketch constructs a global hash H(i1,,iq)=(h1(i1)++hq(iq))modmH(i_1,\dots,i_q) = (h_1(i_1) + \cdots + h_q(i_q)) \bmod m and sign S(i1,,iq)=s1(i1)sq(iq)S(i_1,\dots,i_q) = s_1(i_1) \cdots s_q(i_q), giving a mapping S=ΩDRm×NS = \Omega D \in \mathbb{R}^{m \times N}. The polynomial kernel embedding, subspace embedding, and approximate matrix multiplication properties follow from this randomization structure (Diao et al., 2017, Ahle et al., 2019).

2. Theoretical Guarantees: Johnson–Lindenstrauss Embedding and Error Bounds

TensorSketch satisfies strong Johnson–Lindenstrauss (JL) moment and tail bounds for polynomial kernel and Kronecker-product subspace embeddings. Formally, for any fixed λ\lambda-dimensional subspace SRdpS \subseteq \mathbb{R}^{d^p}, with m=O(pλϵ2polylog(1/(ϵδ)))m = O(p \lambda \epsilon^{-2} \mathrm{poly}\log(1/(\epsilon \delta))) rows and probability at least 1δ1-\delta, all xSx \in S satisfy

Mx22=(1±ϵ)x22,\|Mx\|_2^2 = (1 \pm \epsilon)\|x\|_2^2,

where MM is a TensorSketch matrix (Ahle et al., 2019). For single kernel approximations, variance is bounded by

Var[φ(x),φ(y)]3p1Dx2py2p,\mathrm{Var}[\langle \varphi(x), \varphi(y) \rangle] \leq \frac{3^p - 1}{D} \|x\|^{2p} \|y\|^{2p},

producing additive error O(ϵx,yp)O(\epsilon|\langle x, y \rangle|^p) with D=O(3pϵ2log(1/δ))D = O(3^p \epsilon^{-2}\log(1/\delta)) (Pham et al., 13 May 2025).

For Kronecker regression and low-rank tensor decompositions, TensorSketch with m=O(R23N1(R2+ϵ2)/δ)m = O(R^2 3^{N-1} (R^2 + \epsilon^{-2})/\delta) guarantees (1+ϵ)(1+\epsilon)-relative error for each sketched least-squares subproblem, where RR is the mode rank and NN the tensor order (Yu et al., 2022, Ma et al., 2021, Chen et al., 2023).

3. Algorithmic Efficiency: Convolution, FFTs, and Complexity

TensorSketch's core efficiency arises from a polynomial–convolution strategy and use of the FFT:

  • Kronecker-structured sketches are carried out by computing separate FFTs of one-dimensional CountSketches, followed by Hadamard products, then inverse FFT, costing O(qmlogm)O(q m \log m) per vector and O(q(n+mlogm))O(q(n + m \log m)) per tensor when qq is the tensor order (Diao et al., 2017, Pham et al., 13 May 2025).
  • Memory and runtime scale as O(m)O(m) and O(d+mlogm)O(d + m \log m) per embedding of vector xRdx \in \mathbb{R}^d, and O(n(d+mlogm))O(n (d + m \log m)) for nn vectors (Pham et al., 13 May 2025).

In matrix multiplication, TensorSketch forms C=S(A)TS(B)C = S(A)^T S(B) for A,BRn×nA,B\in\mathbb{R}^{n\times n}, achieving

E[CABF2](n/r)ABF2\mathbb{E}[\|C - AB\|_F^2] \leq (n/r)\|AB\|_F^2

in O(n2rlogn)O(n^2 r \log n) time (Uffenheimer et al., 4 Feb 2026). For tensor decompositions and randomized ALS, each mode subproblem's cost is O(nnz(T)+msR)O(\mathrm{nnz}(\mathcal{T}) + m s R), where ss is mode size and RR rank.

4. Applications: Kernel Approximation, Regression, and Tensor Decomposition

TensorSketch has central algorithmic roles in:

  • Fast random feature maps for degree-pp polynomial kernels in scalable SVMs and ridge regression (Pham et al., 13 May 2025). Empirical results show that for fixed DD, TensorSketch's accuracy matches Random Maclaurin and exceeds Rademacher-based alternatives, while running $1$–$2$ orders of magnitude faster than O(dD)O(dD) methods.
  • Approximate matrix multiplication, with output-sensitive variance bounds; it matches the best-known error for unbiased AMM provably (Uffenheimer et al., 4 Feb 2026).
  • Efficient Kronecker product regression (2\ell_2 and, via two-stage schemes, 1\ell_1 and general p<2p<2 norms), nonnegative regression, and regularized spline regression in input-sparsity time (Diao et al., 2017).
  • Sketching-based CP, Tucker, tensor train (TT), and tensor ring (TR) decompositions, enabling fast ALS updates by compressing large Khatri–Rao or Kronecker chains to lower-dimensional spaces while controlling error (Wang et al., 2015, Ma et al., 2021, Chen et al., 2023, Yu et al., 2022).
  • Spectral LDA and moment-based latent variable models, by efficiently sketching empirical co-occurrence tensors (Wang et al., 2015).

5. Extensions, Optimizations, and Variants

Key developments include:

  • Tree-structured (hierarchical) sketching to reduce exponential dependence on the degree pp in the required sketch size DD for higher-order kernels (Pham et al., 13 May 2025, Ahle et al., 2019).
  • Hybrid constructions combining TensorSketch with fast Johnson–Lindenstrauss transforms or subsampled randomized Hadamard transforms (SRHT), further improving embedding dimension with trade-offs in application time (Ahle et al., 2019).
  • Higher-order CountSketch structures, e.g., symmetric colliding hashes for symmetric tensors in moment computations, significant for topic modeling and higher-way spectral decompositions (Wang et al., 2015).
  • Robustness to input coherence and one-pass streaming implementation, contrasted with leverage-score sampling which requires recomputing scores per iteration and may fail under high coherence (Ma et al., 2021).

6. Empirical Performance and Practical Impact

Empirical studies demonstrate that:

  • In kernel approximation and plug-in random feature maps, TensorSketch yields classification and regression accuracy statistically indistinguishable from the exact polynomial kernel, but dramatically reduces feature generation time—often from hours to minutes (Pham et al., 13 May 2025).
  • For tensor and Kronecker regression problems, TensorSketch enables sublinear algorithms with (1+ϵ)(1+\epsilon)-relative-error guarantees, attaining competitive or improved residuals with modest sketch sizes and input-sparsity runtime (Yu et al., 2022, Diao et al., 2017).
  • In large-scale tensor decompositions (CP, Tucker, TT, TR), TensorSketch-based ALS algorithms converge in a small number of sweeps and attain accuracy within a few percent of full ALS, while allowing scaling to tensors with 10710^7 nonzeros or more (Ma et al., 2021, Chen et al., 2023).
  • For real-world data (images, hyperspectral cubes, videos), TensorSketch-based TT-ALS and TR-ALS deliver order-of-magnitude speedups compared to classical ALS, with negligible loss in quality (Chen et al., 2023, Yu et al., 2022).

7. Historical Context and Comparative Analysis

TensorSketch was introduced by Pagh (2013) and further developed by Pham–Pagh, and Avron, Nguyen, Woodruff for subspace embedding applications (Diao et al., 2017, Ahle et al., 2019). It is distinct from naive random projections and random Rademacher features due to its optimal embedding for Kronecker/tensor product structures and convolutional low-memory design.

Compared to Kronecker-SRFT, TensorSketch is especially effective for high-dimensional, sparse, or streaming data, delivering favorable trade-offs between accuracy, time, and memory for a wide array of modern applications in machine learning, numerical linear algebra, and computational statistics (Yu et al., 2022, Ma et al., 2021, Chen et al., 2023).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to TensorSketch.