TensorSketch: Efficient High-Dimensional Mapping

Updated 23 February 2026

TensorSketch is a structured random projection technique that efficiently approximates high-dimensional polynomial kernels and tensor contractions using CountSketch and FFT.
It leverages a convolution strategy and FFT to achieve subspace embeddings with strong Johnson–Lindenstrauss guarantees and controlled variance.
The method is applied in scalable kernel learning, tensor decompositions, and randomized matrix multiplication, offering practical benefits in regression and latent variable models.

TensorSketch is a structured random projection technique designed for efficient approximation of high-dimensional polynomial kernels, tensor contractions, and large Kronecker-structured linear operators. By fusing CountSketch with the convolution theorem and fast Fourier transforms (FFT), TensorSketch enables fast, low-memory oblivious subspace embeddings for tensor products, Kronecker products, and structured matrix multiplications that arise in kernel learning, tensor decompositions, and large-scale regression problems (Pham et al., 13 May 2025, Ahle et al., 2019, Diao et al., 2017, Wang et al., 2015, Yu et al., 2022).

1. Core Construction and Algorithmic Principles

TensorSketch generalizes CountSketch from vectors to tensor products by defining randomized hash-and-sign sketches over multi-indices. Given $x \in \mathbb{R}^d$ and a fixed integer $p \geq 1$ , the homogeneous polynomial kernel $k(x,y) = \langle x, y \rangle^p$ can be approximated by embedding $x$ into $\mathbb{R}^D$ via a data-independent random map $\varphi(x)$ , such that

$\mathbb{E}[\langle \varphi(x), \varphi(y) \rangle] = \langle x, y \rangle^p,$

with variance decaying as $D$ grows (Pham et al., 13 May 2025).

The construction involves:

Sampling $p$ pairs of independent hash $h_j: [d] \to [D]$ (typically 3-wise independent) and sign $s_j: [d] \to \{\pm 1\}$ (typically 4-wise independent) functions.
Defining CountSketches $c_j = C_{h_j,s_j}(x)$ for $j=1, \ldots, p$ .
Computing FFTs: $\widehat{c}_j = \mathrm{FFT}(c_j)$ .
Taking the elementwise (Hadamard) product: $\widehat{c} = \widehat{c}_1 \circ \cdots \circ \widehat{c}_p$ .
Returning $\varphi(x) = \mathrm{IDFT}(\widehat{c})$ .

For general tensors $T \in \mathbb{R}^{n_1 \times \cdots \times n_q}$ or Kronecker products, TensorSketch constructs a global hash $H(i_1,\dots,i_q) = (h_1(i_1) + \cdots + h_q(i_q)) \bmod m$ and sign $S(i_1,\dots,i_q) = s_1(i_1) \cdots s_q(i_q)$ , giving a mapping $S = \Omega D \in \mathbb{R}^{m \times N}$ . The polynomial kernel embedding, subspace embedding, and approximate matrix multiplication properties follow from this randomization structure (Diao et al., 2017, Ahle et al., 2019).

2. Theoretical Guarantees: Johnson–Lindenstrauss Embedding and Error Bounds

TensorSketch satisfies strong Johnson–Lindenstrauss (JL) moment and tail bounds for polynomial kernel and Kronecker-product subspace embeddings. Formally, for any fixed $\lambda$ -dimensional subspace $S \subseteq \mathbb{R}^{d^p}$ , with $m = O(p \lambda \epsilon^{-2} \mathrm{poly}\log(1/(\epsilon \delta)))$ rows and probability at least $1-\delta$ , all $x \in S$ satisfy

$\|Mx\|_2^2 = (1 \pm \epsilon)\|x\|_2^2,$

where $M$ is a TensorSketch matrix (Ahle et al., 2019). For single kernel approximations, variance is bounded by

$\mathrm{Var}[\langle \varphi(x), \varphi(y) \rangle] \leq \frac{3^p - 1}{D} \|x\|^{2p} \|y\|^{2p},$

producing additive error $O(\epsilon|\langle x, y \rangle|^p)$ with $D = O(3^p \epsilon^{-2}\log(1/\delta))$ (Pham et al., 13 May 2025).

For Kronecker regression and low-rank tensor decompositions, TensorSketch with $m = O(R^2 3^{N-1} (R^2 + \epsilon^{-2})/\delta)$ guarantees $(1+\epsilon)$ -relative error for each sketched least-squares subproblem, where $R$ is the mode rank and $N$ the tensor order (Yu et al., 2022, Ma et al., 2021, Chen et al., 2023).

3. Algorithmic Efficiency: Convolution, FFTs, and Complexity

TensorSketch's core efficiency arises from a polynomial–convolution strategy and use of the FFT:

Kronecker-structured sketches are carried out by computing separate FFTs of one-dimensional CountSketches, followed by Hadamard products, then inverse FFT, costing $O(q m \log m)$ per vector and $O(q(n + m \log m))$ per tensor when $q$ is the tensor order (Diao et al., 2017, Pham et al., 13 May 2025).
Memory and runtime scale as $O(m)$ and $O(d + m \log m)$ per embedding of vector $x \in \mathbb{R}^d$ , and $O(n (d + m \log m))$ for $n$ vectors (Pham et al., 13 May 2025).

In matrix multiplication, TensorSketch forms $C = S(A)^T S(B)$ for $A,B\in\mathbb{R}^{n\times n}$ , achieving

$\mathbb{E}[\|C - AB\|_F^2] \leq (n/r)\|AB\|_F^2$

in $O(n^2 r \log n)$ time (Uffenheimer et al., 4 Feb 2026). For tensor decompositions and randomized ALS, each mode subproblem's cost is $O(\mathrm{nnz}(\mathcal{T}) + m s R)$ , where $s$ is mode size and $R$ rank.

4. Applications: Kernel Approximation, Regression, and Tensor Decomposition

TensorSketch has central algorithmic roles in:

Fast random feature maps for degree- $p$ polynomial kernels in scalable SVMs and ridge regression (Pham et al., 13 May 2025). Empirical results show that for fixed $D$ , TensorSketch's accuracy matches Random Maclaurin and exceeds Rademacher-based alternatives, while running $1$–$2$ orders of magnitude faster than $O(dD)$ methods.
Approximate matrix multiplication, with output-sensitive variance bounds; it matches the best-known error for unbiased AMM provably (Uffenheimer et al., 4 Feb 2026).
Efficient Kronecker product regression ( $\ell_2$ and, via two-stage schemes, $\ell_1$ and general $p<2$ norms), nonnegative regression, and regularized spline regression in input-sparsity time (Diao et al., 2017).
Sketching-based CP, Tucker, tensor train (TT), and tensor ring (TR) decompositions, enabling fast ALS updates by compressing large Khatri–Rao or Kronecker chains to lower-dimensional spaces while controlling error (Wang et al., 2015, Ma et al., 2021, Chen et al., 2023, Yu et al., 2022).
Spectral LDA and moment-based latent variable models, by efficiently sketching empirical co-occurrence tensors (Wang et al., 2015).

5. Extensions, Optimizations, and Variants

Key developments include:

Tree-structured (hierarchical) sketching to reduce exponential dependence on the degree $p$ in the required sketch size $D$ for higher-order kernels (Pham et al., 13 May 2025, Ahle et al., 2019).
Hybrid constructions combining TensorSketch with fast Johnson–Lindenstrauss transforms or subsampled randomized Hadamard transforms (SRHT), further improving embedding dimension with trade-offs in application time (Ahle et al., 2019).
Higher-order CountSketch structures, e.g., symmetric colliding hashes for symmetric tensors in moment computations, significant for topic modeling and higher-way spectral decompositions (Wang et al., 2015).
Robustness to input coherence and one-pass streaming implementation, contrasted with leverage-score sampling which requires recomputing scores per iteration and may fail under high coherence (Ma et al., 2021).

6. Empirical Performance and Practical Impact

Empirical studies demonstrate that:

In kernel approximation and plug-in random feature maps, TensorSketch yields classification and regression accuracy statistically indistinguishable from the exact polynomial kernel, but dramatically reduces feature generation time—often from hours to minutes (Pham et al., 13 May 2025).
For tensor and Kronecker regression problems, TensorSketch enables sublinear algorithms with $(1+\epsilon)$ -relative-error guarantees, attaining competitive or improved residuals with modest sketch sizes and input-sparsity runtime (Yu et al., 2022, Diao et al., 2017).
In large-scale tensor decompositions (CP, Tucker, TT, TR), TensorSketch-based ALS algorithms converge in a small number of sweeps and attain accuracy within a few percent of full ALS, while allowing scaling to tensors with $10^7$ nonzeros or more (Ma et al., 2021, Chen et al., 2023).
For real-world data (images, hyperspectral cubes, videos), TensorSketch-based TT-ALS and TR-ALS deliver order-of-magnitude speedups compared to classical ALS, with negligible loss in quality (Chen et al., 2023, Yu et al., 2022).

7. Historical Context and Comparative Analysis

TensorSketch was introduced by Pagh (2013) and further developed by Pham–Pagh, and Avron, Nguyen, Woodruff for subspace embedding applications (Diao et al., 2017, Ahle et al., 2019). It is distinct from naive random projections and random Rademacher features due to its optimal embedding for Kronecker/tensor product structures and convolutional low-memory design.

Compared to Kronecker-SRFT, TensorSketch is especially effective for high-dimensional, sparse, or streaming data, delivering favorable trade-offs between accuracy, time, and memory for a wide array of modern applications in machine learning, numerical linear algebra, and computational statistics (Yu et al., 2022, Ma et al., 2021, Chen et al., 2023).