Residual Quantization RQ-KMeans

Updated 3 April 2026

Residual Quantization (RQ)-KMeans is a multi-stage vector quantizer that sequentially encodes data residuals using k-means, enabling efficient compression and similarity search.
It iteratively refines quantization through layered codebooks and strategies like beam search, reducing mean squared error between original and reconstructed data.
Enhanced variants, such as Regularized Residual Quantization (RRQ) and neural adaptations, address limitations in high dimensions by incorporating regularization and learned transformations.

Residual Quantization with K-Means (RQ-KMeans) is a multi-stage vector quantization framework in which a sequence of codebooks is trained to successively quantize the residual errors of data points. RQ-KMeans forms the foundation of a family of methods for data compression, large-scale similarity search, and image representation. Recent advances—including Regularized Residual Quantization (RRQ) and neural codebook instantiations—address core limitations of classical RQ-KMeans, particularly in high-dimensional regimes.

1. Fundamentals of Residual Quantization with K-Means

Given a dataset $X = \{x_i \in \mathbb{R}^d\}_{i=1}^N$ , RQ-KMeans constructs an $M$ -layer hierarchical quantizer as follows:

Stage 0: Set the initial residuals $r^{(0)}_i = x_i$ .
Stage $m$ : Learn a codebook $C^{(m)} = \{c^{(m)}_1, \dots, c^{(m)}_K\}$ using k-means on residuals from the previous stage $\{r^{(m-1)}_i\}$ . Assign each residual the nearest centroid:

$a^{(m)}_i = \arg\min_{k} \| r^{(m-1)}_i - c^{(m)}_k \|^2_2$

and update the residual:

$r^{(m)}_i = r^{(m-1)}_i - c^{(m)}_{a^{(m)}_i}$

Reconstruction: After $M$ stages, reconstruct as

$\hat{x}_i = \sum_{m=1}^{M} c^{(m)}_{a^{(m)}_i}$

Each layer alternates nearest-center assignments and centroids updates via expectation maximization. The quantizer seeks to minimize the final mean squared error (MSE) between $M$ 0 and $M$ 1.

Encoding and Decoding

Encoding: Sequential greedy assignment at each layer, or approximate global optimization using beam search to overcome early-stage assignment errors.
Decoding: Simple summation over selected codewords.

RQ-KMeans thus decomposes the quantization task into a deep stack of simpler k-means quantizations (Ferdowsi et al., 2017, Liu et al., 2015, Yuan et al., 2015, Huijben et al., 2024, Vallaeys et al., 6 Jan 2025).

2. Limitations of Classical RQ-KMeans

RQ-KMeans exposes several notable deficiencies, especially in high dimensions:

Train–test generalization gap: In high-dimensional spaces ( $M$ 2), k-means centroids optimize training set distortion but do not generalize, leading to elevated test errors (Ferdowsi et al., 2017, Ferdowsi et al., 2017).
Storage and computational cost: Dense codebooks scale poorly ( $M$ 3 per layer).
Diminishing returns: Deeper layers receive noisy, low-norm residuals, making centroid structure non-informative ("vanishing benefit" phenomenon).
Residual heterogeneity: Since the residual distribution after each assignment depends on prior choices, fitting a global codebook is suboptimal (Huijben et al., 2024, Vallaeys et al., 6 Jan 2025).
Encoding NP-hardness: Exact optimal code assignment over multiple stages is NP-hard due to cross-terms, necessitating approximations or greedy heuristics (Liu et al., 2015).

These limitations motivate variants with better regularization, structural adaptation, or algorithmic modifications.

3. Algorithmic Enhancements and Variants

Improved Training and Encoding

Warm-started K-Means (ICL): Initialization of k-means codebooks in low-dimensional PCA subspaces, progressively increasing the subspace size. This approach yields more information-dense and robust codebooks (Liu et al., 2015).
Beam-Search Multi-Path Encoding: Instead of greedy encoding, maintain the top L hypotheses at each stage, expanding combinations to correct early-stage errors and reduce distortion (often by 10–40% over greedy) (Liu et al., 2015).
Cluster-wise Transformations (TRQ): After each cluster assignment, transform residuals via per-cluster orthogonal matrices to isotropize them before the next quantization step, reducing quantization distortion and improving search recall (Yuan et al., 2015).

Regularized Codebook Design

Variance Regularization (RRQ): Codebook vectors are sampled or optimized to match a "reverse water-filling" variance profile. At each layer, per-dimension codeword variance is regularized to

$M$ 4

with $M$ 5 chosen to satisfy the rate constraint. This suppresses overfitting and ensures codewords are sparse in low-variance directions (Ferdowsi et al., 2017, Ferdowsi et al., 2017).

VR-KMeans: Imposes a penalty to enforce codebook dimension variances to track the water-filling solution, thus controlling both sparsity and overfitting (Ferdowsi et al., 2017).

These enhancements allow multi-layer quantizers to scale to hundreds or thousands of layers without the degeneracies of unregularized k-means cascades.

4. Regularized Residual Quantization (RRQ) Framework

RRQ emerges as a practical improvement over RQ-KMeans for high-dimensional data and deep quantization stacks:

Preprocessing: Transform images using 2D DCT, split into sub-bands, decorrelate via PCA, yielding dimensionally sorted and nearly independent features.
Layered Regularized Codebook Generation: At each layer, after computing per-dimension residual variances, solve for the water-filling threshold $M$ 6, construct a diagonal covariance from $M$ 7, and sample or optimize codewords with these prescribed variances.
Sparsity and Robustness: The soft-threshold $M$ 8 induces sparsity, discarding low-variance dimensions and mitigating overfitting; empirical results show RRQ achieves negligible train–test distortion gap and supports much deeper quantizer stacks than RQ-KMeans (Ferdowsi et al., 2017, Ferdowsi et al., 2017).

Empirical validation on CroppedYale-B faces demonstrates superior test PSNR at low bit-rates (outperforming JPEG-2000 below ~0.05 bpp) and effective denoising of noisy test images, even rivaling BM3D at moderate noise levels (Ferdowsi et al., 2017).

5. Neural and Hybrid RQ-KMeans Frameworks

Recent developments introduce neural adaptations that address RQ-KMeans's core inefficiency of using fixed codebooks:

QINCo: Replaces static codebooks at each layer with small residual MLPs conditioned on previous partial reconstructions. Each codeword is contextually specialized for the region of feature space being quantized:

$M$ 9

This parameterization allows efficient realization of an exponential number of local codebooks with storage complexity only linear in $r^{(0)}_i = x_i$ 0, $r^{(0)}_i = x_i$ 1, and $r^{(0)}_i = x_i$ 2. QINCo yields substantial improvements in MSE and recall at fixed code size: for instance, 16-byte codes on BigANN1M achieve 0.32 MSE (QINCo) versus 1.30 (RQ), and recall@1 increases from 49.0% to 71.9% (Huijben et al., 2024).

QINCo2: Augments QINCo with codeword pre-selection, beam-search encoding (to mitigate assignment errors), and a fast pairwise additive decoder for efficient large-scale retrieval. These additions further reduce MSE (e.g., 34% lower MSE on BigANN compared to QINCo) and increase search recall, with gains of up to +24% absolute recall on Deep1M at 8 bytes (Vallaeys et al., 6 Jan 2025).

Neural RQ variants require significantly more computation, especially at encoding time, but deliver substantial improvements in compression fidelity and retrieval effectiveness.

6. Empirical Comparisons and Applications

RQ-KMeans and its derivatives are evaluated across domains:

Compression: RRQ consistently outperforms JPEG-2000 at low bit rates for images, with extremely narrow generalization gaps due to strong regularization (Ferdowsi et al., 2017).
Denoising: RRQ, trained solely on clean images, denoises test images without retraining, outperforming or matching BM3D in PSNR, especially at high noise variance (Ferdowsi et al., 2017).
Large-Scale Approximate Nearest Neighbor (ANN) Search: RQ-KMeans, TRQ, QINCo, and QINCo2 are integrated with multi-index or IVF schemes. QINCo-based methods provide up to 20 points higher recall@1 over classic RQ-KMeans, with efficient shortlisting via pairwise-coded decoders (Yuan et al., 2015, Huijben et al., 2024, Vallaeys et al., 6 Jan 2025).
Super-Resolution: RRQ-based super-resolvers restore high-frequency details in low-resolution facial images by reconstructing with multi-layer codebooks learned from high-resolution data (Ferdowsi et al., 2017).

7. Practical Considerations and Outlook

Hyperparameter Choice: The number of layers $r^{(0)}_i = x_i$ 3, codebook size $r^{(0)}_i = x_i$ 4, and regularization $r^{(0)}_i = x_i$ 5 control the rate-distortion-complexity tradeoff. Practitioners typically choose $r^{(0)}_i = x_i$ 6 to match a target distortion drop, $r^{(0)}_i = x_i$ 7 in the range 128–512, and $r^{(0)}_i = x_i$ 8 in [0.1,10] for regularized variants (Ferdowsi et al., 2017).
Method Selection: RQ-KMeans remains attractive for its conceptual simplicity and low computational burden, especially suitable for CPU-efficient and hardware-constrained scenarios. In contrast, RRQ and neural extensions (QINCo, QINCo2) require more computation or more complex infrastructure but deliver state-of-the-art rate–distortion performance for both compression and nearest-neighbor retrieval (Huijben et al., 2024, Vallaeys et al., 6 Jan 2025).
Future Directions: A plausible implication is that further modeling of residual dependencies, hybridization with product quantization (PQ), and scalable neural codebook parameterizations will continue to close the gap to theoretical rate-distortion limits, especially in high-dimensional and semantically structured data regimes.

Key References:

(Ferdowsi et al., 2017) "A multi-layer image representation using Regularized Residual Quantization: application to compression and denoising"
(Ferdowsi et al., 2017) "Regularized Residual Quantization: a multi-layer sparse dictionary learning approach"
(Liu et al., 2015) "Improved Residual Vector Quantization for High-dimensional Approximate Nearest Neighbor Search"
(Yuan et al., 2015) "Transformed Residual Quantization for Approximate Nearest Neighbor Search"
(Huijben et al., 2024) "Residual Quantization with Implicit Neural Codebooks"
(Vallaeys et al., 6 Jan 2025) "Qinco2: Vector Compression and Search with Improved Implicit Neural Codebooks"