Quantizing With Randomized Hadamard Transforms: Efficient Heuristic Now Proven

Published 7 May 2026 in cs.LG, cs.AI, cs.DS, and cs.NI | (2605.06014v1)

Abstract: Uniform random rotations (URRs) are a common preprocessing step in modern quantization approaches used for gradient compression, inference acceleration, KV-cache compression, model weight quantization, and approximate nearest-neighbor search in vector databases. In practice, URRs are often replaced by randomized Hadamard transforms (RHTs), which preserve orthogonality while admitting fast implementations. The remaining issue is the performance for worst-case inputs. With a URR, each coordinate is individually distributed as a shifted beta distribution, which converges to a Gaussian distribution in high dimensions. Generally, one RHT is not suitable in the worst case, as individual coordinates can be far from these distributions. We show that after composing two RHTs on any $d$-sized input vector, the marginal distribution of every fixed coordinate of the normalized rotated vector is within $O(d^{-1/2})$ of a standard Gaussian both in Kolmogorov distance and in $1$-Wasserstein distance. We then plug these bounds into the analyses of modern compression schemes, namely DRIVE and QUIC-FL, and show that two RHTs achieve performance that asymptotically matches URRs. However, we show that two RHTs may not be sufficient for Vector Quantization (VQ), which often requires weak correlation across fixed-size blocks of coordinates (as opposed to only marginal distribution convergence for single coordinates). We prove that a composition of three RHTs leads to decaying coordinate covariance. This ensures that any fixed, bounded, multi-dimensional VQ codebook optimized for URRs has the same expected error when using three RHTs, up to an additive term that vanishes with the dimension. Finally, because practical inputs are rarely adversarial, we propose a linear-time ${O}(d)$ check on the input's moments to dynamically adapt the number of RHTs used at runtime to improve performance.

Abstract PDF Upgrade to Chat

Authors (5)

Summary

The paper demonstrates that two RHTs yield coordinate-wise Gaussianity with an O(d⁻¹ᐟ²) error, matching DRIVE and QUIC-FL's optimal performance.
It shows that a third RHT overcomes block decorrelation issues in vector quantization, achieving nearly independent Gaussian blocks.
An adaptive, linear-time layer selection scheme is introduced to balance worst-case guarantees with practical efficiency.

Provably Efficient Quantization with Randomized Hadamard Transforms

Introduction

Randomized orthogonal transformations, particularly uniform random rotations (URRs), are foundational in quantization and compression for distributed learning, model inference, and approximate nearest neighbor (ANN) search. The structural smoothing imparted by URRs enables coordinate-wise Gaussianity and block-wise independence, underpinning the optimality of modern quantization schemes (e.g., DRIVE and QUIC-FL). However, while URRs have optimal statistical properties, their $O(d^2)$ computational complexity is prohibitive at high dimension. This challenge has led to the heuristic adoption of randomized Hadamard transforms (RHTs)—fast, orthogonal surrogates computable in $O(d \log d)$ —despite weaker worst-case guarantees. The paper "Quantizing With Randomized Hadamard Transforms: Efficient Heuristic Now Proven" (2605.06014) provides the first dimension-dependent, worst-case theoretical analysis that closes the gap between heuristic RHT usage and URRs, yielding nearly optimal guarantees with a tractable number of fast RHT layers.

Analytical Framework

The key technical insight is that composing multiple RHTs—i.e., successively multiplying by randomized Hadamard matrices alternated with random diagonal sign flips—is sufficient to regain the strong probabilistic properties of URRs. The authors establish, for any arbitrary (including adversarial or sparse) input vector, that the marginal distribution of each coordinate after two RHTs converges to standard normal, with $\mathcal O(d^{-1/2})$ error in both Kolmogorov and Wasserstein-1 metrics. This enables quantization schemes designed for Gaussianity, such as DRIVE and QUIC-FL, to achieve their optimal theoretical guarantees up to vanishing error. However, for vector quantization (VQ) tasks that require multi-dimensional block decorrelation, two RHTs are insufficient: perfect conditional dependencies may persist for specially constructed sparse vectors. The authors mathematically prove that a third RHT eliminates this bottleneck, yielding block covariance decay, and thus enabling near-optimal performance for VQ with universal codebooks.

Strong Numerical and Theoretical Results

Scalar Quantization

Marginal Law: For any input $x\in\mathbb{R}^d$ , after two RHTs, the distribution of a fixed coordinate is $\mathcal{O}(d^{-1/2})$ -close to standard normal, simultaneously in Kolmogorov and Wasserstein-1 metrics. Thus, the distribution "forgets" the input structure as $d\to\infty$ .
DRIVE Guarantees: For both biased and unbiased estimators in the DRIVE algorithm, the theoretical vNMSE with 2-RHT matches that under URR, including the limiting constants $(1-2/\pi)$ (biased) and $(\pi/2-1)$ (unbiased), up to an additive $\mathcal{O}(d^{-1/2})$ error. This resolves the theoretical looseness in prior RHT analyses, which could only show loose $0.5$ upper bounds (biased) or lacked any bound (unbiased).
QUIC-FL and Bandwidth: For bounded support quantization (BSQ) as in QUIC-FL, the fraction of outlier coordinates (requiring transmission at higher precision) is also asymptotically optimal. The worst-case multiplicative penalty for 1-RHT, proven to be $O(d \log d)$ 0 above the Gaussian rate, vanishes and reduces to an additive term with 2-RHT.

Vector Quantization (VQ)

Conditional Covariance Bottleneck: The authors construct an explicit family of sparse inputs showing that no composition of two RHTs suffices for block decorrelation, as conditional coordinate correlations may remain $O(d \log d)$ 1.
Three RHTs for Decorrelation: Composing a third RHT drives the root mean square conditional covariance between any pair of coordinates in a block to $O(d \log d)$ 2. Hence, convergence to i.i.d. Gaussian blocks is achieved, retaining theoretical optimality for standard VQ codebooks.
Universality for Codebooks: For any fixed, bounded VQ codebook, the expected quantization error under 3-RHTs converges to its performance under URR or standard Gaussian input, up to $O(d \log d)$ 3.

Adaptive RHT Layer Selection

Recognizing that most natural (non-adversarial) data already exhibit approximately "flat" $O(d \log d)$ 4 and $O(d \log d)$ 5 norms, the authors provide a linear-time scheme to check input conformance. If the vector is sufficiently isotropic, only one RHT is required; otherwise, the system applies the minimal number of RHTs needed to guarantee the required marginal or joint normality, enabling practical systems to adapt for performance without sacrificing theory.

Implications and Future Directions

This work fundamentally recharacterizes the role of structured random transformations in quantization under worst-case conditions. It demonstrates that a bounded, data-independent number of fast RHT compositions suffices to recover the strong distributional properties of URR in both scalar and block quantization. These results not only justify the widespread heuristic usage of RHT but also enable practitioners to reason about system error and bandwidth provisioning with rigor.

Practically, these findings facilitate the robust deployment of fast quantization schemes for training and inference acceleration, federated learning, privacy-preserving aggregation, and ANN search, supporting both worst-case and average-case settings. Theoretically, the modular Berry-Esseen and Stein's method-based analysis unifies scalar and block marginal control, and suggests new avenues for tight normal approximation with structured transforms.

A natural next step is to extend these results to settings requiring complex independence topologies, such as data-dependent adaptivity in VQ or higher-order block structures, which may require analysis of compositions beyond three RHTs. The techniques herein may also inspire further improvements in fast unitary randomization primitives for kernel approximation, privacy, and error correction.

Conclusion

By bridging a crucial gap between theory and high-performance practice, this work elevates randomized Hadamard transforms from heuristic surrogates to mathematically justified, nearly optimal randomizers for quantization and compression. The guarantees— $O(d \log d)$ 6 Gaussianity after two RHTs, and block decorrelation after three—are universal for all vectors, not just typical or random-case inputs. The resulting prescriptions clarify and solidify the deployment of a core primitive across distributed and private machine learning algorithms and vector search systems. The adaptive, linear-time check further ensures the separation between worst-case protection and typical-case speed, retaining theoretical correctness without incurring unnecessary overhead (2605.06014).