Recursive Residual Quantization

Updated 9 December 2025

Recursive Residual Quantization is a multi-stage approach that approximates inputs as the sum of sequentially quantized residuals for enhanced accuracy.
It underpins techniques like residual vector, scalar, and binary quantization, facilitating applications in neural compression and large-scale search.
Recent advancements leverage learnable scaling, invertible normalization, and neural codebooks to mitigate residual decay and improve performance.

Recursive residual quantization is a multi-stage quantization paradigm where an input signal, tensor, or vector is approximated as the sum of quantized outputs from a sequence of distinct quantization operators, each operating on the residual left by the previous stage. This recursive framework underpins classical and modern quantization techniques for neural compression, large-scale vector search, and low-precision inference, offering exponential error decay with the number of stages and flexibly trading off rate, distortion, and hardware efficiency.

1. Fundamental Principles and Mathematical Formulation

At the core of recursive residual quantization, the input $x$ is iteratively approximated by a sum of quantized representations:

$r_0 = x,\quad q_1 = Q_1(r_0),\quad r_1 = r_0 - q_1$

$q_2 = Q_2(r_1),\quad r_2 = r_1 - q_2$

$\dots$

$q_K = Q_K(r_{K-1}),\quad r_K = r_{K-1} - q_K$

$\hat{x} = \sum_{i=1}^K q_i$

Here, $Q_i$ can be a scalar, vector, or binary quantization operator, and $K$ is the number of stages. Each quantizer operates on the latest residual, extracting the largest remaining quantizable component. This recursive expansion naturally generalizes to popular cases such as residual vector quantization (RVQ), scalar schemes, and binarization.

The quantization error after $K$ stages satisfies

$\|x - \hat{x}\|^2 = \|r_K\|^2$

where, under suitable design, $\|r_K\|^2$ decays exponentially with $K$ (Yvinec et al., 2022, Li et al., 2017).

2. Classical and Modern Algorithms

Recursive residual quantization underpins several algorithmic families:

Residual Vector Quantization (RVQ/RQ): Each stage employs a learned vector codebook (via $k$ -means) to quantize the current residual, forming a code that is an $M$ -tuple of stagewise codeword indices (Yuan et al., 2015, Liu et al., 2015, Huijben et al., 2024).
Scalar Residual Quantization: Each stage applies a scalar quantizer to residuals, typically with uniform bins (Zhu, 20 Aug 2025).
High-Order Binary Quantization: HORQ recursively binarizes the input and subsequent residuals, achieving higher accuracy than single-stage binarization (Li et al., 2017).
Neural/Adaptive Extensions: QINCo constructs data-dependent codebooks at each stage via neural networks, conditioned on the running quantized sum, achieving improved accuracy and dynamic-rate adaptability (Huijben et al., 2024).
Data-Free Expansion (REx): Stages are constructed directly from a pre-trained model in a calibration-free manner, expanding the quantized representation by repeated quantization and group-wise pruning (Yvinec et al., 2022).

A comparison of typical models:

Method	Code Representation	Quantizer type
RVQ/RQ (classic)	$M$ -tuple codeword indices	k-means, vector
RFSQ	$K$ quantized scalars	Fixed scalar bins
HORQ	$K$ binary vectors + scalars	Sign, scaling
QINCo	$M$ -tuple indices, neural cb	MLP-adapted vectors
REx	$K$ quantized weights	Uniform, group sparse

3. Limitations and Advanced Conditioning

While recursive quantization enables multi-stage error correction, it suffers from diminishing residual magnitude: each stage reduces the signal's norm, leaving little quantizable information for late stages (“residual magnitude decay problem” (Zhu, 20 Aug 2025)). This leads to vanishing codebook entropy, reduced coding utility, and optimization difficulties in deep cascades (Liu et al., 2015).

Recent work introduces conditioning techniques to maintain meaningful residuals:

Learnable Scaling Factors (Zhu, 20 Aug 2025): Each residual is amplified or attenuated by a learned scalar before quantization, then inversely scaled after:

$u_i = s_i r_{i-1},\quad q_i = Q(u_i),\quad r_i = r_{i-1} - q_i / s_i$

Invertible Layer Normalization (Zhu, 20 Aug 2025): Each residual is normalized to zero mean and unit variance, quantized, and then exactly denormalized with trainable affine parameters $(\gamma, \beta)$ :

$\hat{u}_i = \gamma \frac{r_{i-1} - \mu_{i-1}}{\sigma_{i-1}} + \beta,\quad q_i = Q(\hat{u}_i),\quad r_i = \frac{\sigma_{i-1}}{\gamma}(q_i - \beta) + \mu_{i-1} - r_{i-1}$

These strategies enforce uniform dynamic range and conditioning across all quantization stages, demonstrably improving both optimization stability and information throughput.

4. Theoretical Guarantees and Error Analysis

Recursive residual quantization exhibits provable exponential error decay. For uniform b-bit quantizers, after $K$ stages the error for a scalar $w$ is bounded by

$|w - \sum_{k=1}^K w^{(k)}| \leq (2^{b-1} - 1)^{1-K} s/2$

where $s$ is the quantization scale (Yvinec et al., 2022). Layerwise in networks, the spectral norm of cumulative residuals contracts multiplicatively, and under group-sparse masking the bound only weakly deteriorates.

For HORQ, the $l^2$ residual satisfies

$\|r^{(K)}\|_2^2 = \|x\|_2^2 - n \sum_{k=1}^K (\alpha^{(k)})^2 \leq \|x\|_2^2$

with monotonic $l^2$ -error reduction in $K$ (Li et al., 2017).

These contraction properties are supported by both practical error curves and theoretical analyses (Yvinec et al., 2022, Li et al., 2017, Liu et al., 2015).

5. Key Applications and Empirical Results

Recursive residual quantization is established in several core domains:

Neural Compression: RFSQ and multi-stage FSQ outperform vector quantization and single-stage baselines in image compression tasks. On ImageNet (128×128, 12 bits/token), RFSQ with invertible LayerNorm achieves a 28.7% L1 error reduction and 45% LPIPS improvement over FSQ; PSNR rises to 22.9 dB vs. 20.3 dB (FSQ) (Zhu, 20 Aug 2025).
Model Quantization: REx delivers flexible architectures supporting variable bit-widths and accuracy-cost trade-offs. In EfficientNet-B0 (W2/A8), a sparse order-10 REx recovers 100% full-precision accuracy with just 10% extra bit-ops (Yvinec et al., 2022).
Large-Scale Approximate Nearest Neighbor Search: Residual Quantization (RQ), Improved RVQ, Transformed RQ, and QINCo consistently outpace product quantization (PQ) and conventional RQ, with QINCo achieving substantial recall and MSE improvements (e.g., BigANN1M Recall@1: 71.9% for QINCo-16B vs. 51.1% for best prior) (Huijben et al., 2024). Transformed RQ yields up to +8% absolute recall@1 at scale (Yuan et al., 2015).
Network Acceleration/Binarization: HORQ, using higher-order recursions, recovers much of the accuracy lost in na\"ive binarization, e.g., on CIFAR-10, order-2 HORQ attains 77% test accuracy (2% below full-precision, shrinking loss by 60% relative to order-1) and 20–30× acceleration (Li et al., 2017).

6. Methodological Innovations and Extensions

Modern recursive quantization incorporates several methodological advances:

Per-cluster/local transforms: TRQ applies cluster-wise orthogonal transforms to align residuals, lowering overall distortion and improving recall (Yuan et al., 2015).
Hybrid quantization and multi-path encoding: IRVQ integrates subspace-based clustering with beam search for encoding, preserving codebook entropy and improving high-dimensional search (Liu et al., 2015).
Neural codebook adaptation: QINCo parametrizes codewords conditionally on the running reconstruction, yielding highly adaptive, low-distortion codes and efficient dynamic-rate coding (Huijben et al., 2024).
Structured sparsity: REx employs group-wise $\ell_2$ sparse masking of higher-order residuals to reduce computation, with negligible loss of accuracy (Yvinec et al., 2022).

7. Broader Implications, Limitations, and Outlook

Recursive residual quantization instantiates a unifying architecture for modern quantization across neural compression, efficient inference, and large-scale search. Key insights include the criticality of residual signal conditioning (through normalization or scaling) and the necessity of entropy-preservation in deep cascades. While extensions like neural codebooks and layerwise transforms address many limitations, high model size and encoding compute (notably in QINCo) remain practical challenges (Huijben et al., 2024). Multi-path search and sparse coding further mitigate but do not eliminate bottlenecks for very high-dimensional or resource-constrained regimes.

A plausible implication is that further advances may leverage hierarchical conditioning, dynamic codebook parameterization, and hardware/software co-design to optimize both fidelity and efficiency. The recursive residual framework, with its exponential convergence and algorithmic flexibility, continues to serve as a foundational tool for high-fidelity, high-efficiency quantization (Zhu, 20 Aug 2025, Yvinec et al., 2022, Huijben et al., 2024, Liu et al., 2015, Yuan et al., 2015, Li et al., 2017).

Markdown Upgrade to Chat

References (6)

REx: Data-Free Residual Quantization Error Expansion (2022)

Performance Guaranteed Network Acceleration via High-Order Residual Quantization (2017)

Transformed Residual Quantization for Approximate Nearest Neighbor Search (2015)

Improved Residual Vector Quantization for High-dimensional Approximate Nearest Neighbor Search (2015)

Residual Quantization with Implicit Neural Codebooks (2024)

Robust Residual Finite Scalar Quantization for Neural Compression (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Recursive Residual Quantization.