Provable Quantization with Randomized Hadamard Transform

Published 13 May 2026 in cs.LG and cs.DS | (2605.13810v1)

Abstract: Vector quantization via random projection followed by scalar quantization is a fundamental primitive in machine learning, with applications ranging from similarity search to federated learning and KV cache compression. While dense random rotations yield clean theoretical guarantees, they require $Θ(d^2)$ time. The randomized Hadamard transform $HD$ reduces this cost to $O(d \log d)$, but its discrete structure complicates analysis and leads to weaker or purely empirical compression guarantees. In this work, we study a variant of this approach: dithered quantization with a single randomized Hadamard transform. Specifically, the quantizer applies $HD$ to the input vector and subtracts a random scalar offset before quantizing, injecting additional randomness at negligible cost. We prove that this approach is unbiased and provides mean squared error bounds that asymptotically match those achievable with truly random rotation matrices. In particular, we prove that a dithered version of TurboQuant achieves mean squared error $\bigl(π\sqrt{3}/2 + o(1)\bigr) \cdot 4^{-b}$ at $b$ bits per coordinate, where the $o(1)$ term vanishes uniformly over all unit vectors and all dimensions as the number of quantization levels grows.

Abstract PDF Upgrade to Chat

Authors (5)

Summary

The paper introduces an unbiased quantization scheme using a single randomized Hadamard transform with dithering, achieving asymptotically optimal MSE scaling as 4^(-b).
It details a two-stage inner product quantization method that decorrelates errors and maintains tight error guarantees while controlling bit complexity.
The proposed method offers O(d log d) computational cost, matching error bounds of dense random rotations and supporting scalable applications in similarity search and federated learning.

Provable Quantization with Randomized Hadamard Transform: A Technical Summary

Problem Overview and Motivation

The paper addresses the fundamental problem of data-oblivious vector quantization in high-dimensional spaces, a core primitive underpinning similarity search, federated learning, and neural network compression. The canonical approach—random projection followed by scalar quantization—exhibits strong theoretical guarantees when dense random rotation matrices are used, though this incurs $\Theta(d^2)$ runtime per vector. In practical large-scale deployments, this cost is often prohibitive, motivating the adoption of structured matrices such as the randomized Hadamard transform ( $HD$ ), which allows fast $O(d \log d)$ matrix-vector products via the fast Walsh-Hadamard transform. However, the discrete and non-isotropic nature of $HD$ has left a theoretical gap: standard analyses and quantization error bounds for i.i.d. Gaussian projections do not straightforwardly extend to this setting; most prior results are either empirical or incur additional distortion or dimension-dependent slack.

Main Contributions

The paper introduces and rigorously analyzes a simple yet effective quantization scheme: apply a single randomized Hadamard transform ( $HD$ ), subtract a per-coordinate dither (random uniform offset), and perform scalar quantization using codebooks adapted to the distribution of $HDx$ . Theoretical contributions are as follows:

Unbiasedness and Sub-Gaussian Error Control: The quantizer is shown to be unbiased for all unit-norm input vectors, preserving key statistical properties required for downstream applications.
Tight Error Bounds Matching Fully Random Rotations: The authors establish that the mean squared error (MSE) of their scheme asymptotically matches that of quantization post truly random rotation matrices. Specifically, for $b$ bits per coordinate, the MSE satisfies:

$\sup_{x \in S^{d-1}} \mathbb{E}_{D,U} \|x - \widetilde{x}\|^2_2 \leq \left(\frac{\pi\sqrt{3}}{2} + o(1)\right) 4^{-b}$

uniformly over all $d$ and $x$ as $HD$ 0. This scaling and leading constant are identical to those of techniques using i.i.d. Gaussian or Haar random rotations, up to vanishing lower order terms.

Inner Product Quantization with Decorrelation: For inner product estimation, the authors propose a two-stage scheme: quantize $HD$ 1 as above, then quantize the residual using another randomized Hadamard transform and explicit scale quantization. The resulting inner product error with an arbitrary $HD$ 2 is shown to be bounded by:

$HD$ 3

for any $HD$ 4 and $HD$ 5.

Provably Low Bit Complexity: The residual quantization stage is shown to require only $HD$ 6 total additional bits, and the entire quantized representation maintains near-linear bit complexity in $HD$ 7.

Proof Techniques and Technical Ideas

The analysis overcomes the key challenge that the outputs $HD$ 8 do not have exactly Gaussian marginal distributions but are Rademacher sums depending on the input vector $HD$ 9. The main elements are:

Exploiting Sub-Gaussianity of Rademacher Sums: For any $O(d \log d)$ 0, each $O(d \log d)$ 1 is a normalized Rademacher sum and thus is $O(d \log d)$ 2-subgaussian. This property is leveraged to replace worst-case analysis with one against the standard normal, thanks to carefully tailored comparison inequalities.
Dithered Quantization to Uniformize Local Distributions: Injecting independent uniform random dithers on the quantization grid makes the effective quantization error distribution piecewise uniform over each cell, controlling the position of quantization error even for non-Gaussian input distributions.
First-Order Taylor Expansion and Error Partitioning: The analysis partitions the estimation error into a "central event" where the input falls in a typical range (bucket widths shrink as $O(d \log d)$ 3 grows) and "tails" whose cumulant mass shrinks rapidly due to sub-Gaussianity, allowing the use of first-order Taylor approximations and bounding higher-order remainders tightly.
Unbiased Variant and Codebook Construction: The paper constructs a subtly modified codebook for unbiasedness: rather than reconstruct quantized values at the bucket midpoint under the transformation $O(d \log d)$ 4, they derive a reconstruction rule $O(d \log d)$ 5 meeting an averaging property that ensures zero expected bias irrespective of the distribution of the quantized value within the bucket.
Residual Quantizer for Inner Product Estimation: To decorrelate the quantization error and facilitate unbiased inner product estimation, another $O(d \log d)$ 6 transform with scale quantization and randomized sign coding is used, with careful moment analysis (including Rademacher mixed fourth moments) providing tight error guarantees.

Comparison to Prior Work

Prior work on randomized Hadamard-based quantization has either relied on heuristic or empirical evidence for performance, or has provided provable guarantees only under looser error bounds or requiring compositions of multiple Hadamard blocks (at increased complexity) to approximate Haar-random rotations. Here, a single randomized Hadamard transform plus dithering suffices for optimal asymptotic MSE, matching the established bounds for the fully random case and strictly improving upon prior provable guarantees for structured transforms. Furthermore, the construction is both algorithmically simple and rapidly computable.

Implications and Future Directions

Practically, these results justify the widespread deployment of Hadamard-based fast quantization schemes in large-scale machine learning, especially for inference-time tasks (transformer cache compression, federated learning communication, etc.) where reliable worst-case error control is essential. From a theoretical perspective, the analysis clarifies the sufficiency of a single randomized Hadamard transform, under proper dithering, for achieving distortion-optimal quantization, as opposed to requiring more complex structured random matrices.

Potential future research directions include:

Extending the analysis to broader classes of structured matrices beyond Hadamard (e.g., other orthogonal transforms or products with sparsity constraints).
Investigating adaptive or data-aware quantization codebooks, where one exploits prior knowledge of the data distribution.
Refining constants and non-asymptotic gap terms for small $O(d \log d)$ 7 or finite $O(d \log d)$ 8, and further optimizing bit accounting and entropy coding procedures.
Empirical validation across downstream systems and hardware-integrated implementations.

Conclusion

This work provides a tight, fully provable theoretical foundation for data-oblivious vector quantization based on the randomized Hadamard transform complemented with uniform dithering. The algorithm achieves unbiasedness and MSE guarantees matching dense random rotation-based quantization while maintaining $O(d \log d)$ 9 computational cost. Moreover, it extends to inner product estimation and supports concise bit representations. This bridges an important theoretical gap in high-dimensional vector quantization and aligns provable efficiency with practical deployment needs.