Adaptive Residual Scalar Quantization

Updated 22 December 2025

The paper introduces adaptive conditioning strategies such as learnable scaling and invertible layer normalization to counter residual magnitude decay in multi‐stage scalar quantization.
It employs a cascaded framework that refines quantization progressively using finite scalar quantizers combined with straight-through gradient estimators.
Empirical results demonstrate significant reductions in perceptual loss and L1 error, outperforming traditional quantization methods in neural compression tasks.

Adaptive Residual Scalar Quantization (ARSQ) refers to a class of quantization methods in which a signal is quantized using multiple sequential scalar quantizers, with each stage quantizing the residual error of the previous one. Critically, adaptive mechanisms—such as stage-wise learnable scaling or normalization—are introduced to address dynamic range mismatch in the residuals, preserving quantization effectiveness throughout all stages. ARSQ achieves improved rate-distortion tradeoffs and stability compared to traditional vector quantization and non-adaptive multi-stage scalar quantization, and has demonstrated robust performance in neural compression pipelines and large-scale retrieval settings (Zhu, 20 Aug 2025, Huijben et al., 26 Jan 2024).

1. Finite Scalar Quantization and Multi-Stage Residual Frameworks

Finite Scalar Quantization (FSQ) is defined by mapping each real-valued coordinate independently to one of $L_i$ uniformly spaced levels in $[-1,1]$ . For scalar $z_i \in [-1,1]$ , the quantization operator is: $\mathrm{FSQ}_i(z_i) = \mathrm{round}\Bigl(\tfrac{z_i (L_i-1)}{2}\Bigr) \times \tfrac{2}{L_i-1}$ where the quantized vector in $d$ -dimensions is $\mathbf{q} = [\mathrm{FSQ}_1(z_1),\dots,\mathrm{FSQ}_d(z_d)]$ . To facilitate gradient-based training, the straight-through estimator with an identity Jacobian is used during backpropagation.

Residual quantization extends scalar quantization by cascading $R$ quantization stages. The $r$ -th stage quantizes input $x_r$ to obtain approximation $\hat x_r = \mathrm{FSQ}_r(x_r)$ , and propagates the residual $x_{r+1} = x_r - \hat x_r$ . The total reconstruction is given by the sum $\hat x = \sum_{r=1}^R \hat x_r$ (Zhu, 20 Aug 2025). For scalar or vector $x$ , this framework enables progressive refinement of the quantized representation, as in classical residual quantization methods (Huijben et al., 26 Jan 2024).

2. Residual Magnitude Decay and Its Consequences

In conventional multi-stage FSQ or residual quantization, a severe residual magnitude decay problem arises. Each successive residual $x_{r+1}$ has norm

$\|x_{r+1}\| = \|x_r - \mathrm{FSQ}_r(x_r)\| \ll \|x_r\|$

and empirically, this decay is close to exponential: $\|x_r\| \approx \epsilon^{r-1} \|x_1\|$ for some $\epsilon < 1$ (Zhu, 20 Aug 2025). This collapse in residual dynamic range deprives later stages of meaningful signal, leading to ineffective quantization in higher-order stages and suboptimal rate-distortion performance.

3. Adaptive Conditioning Strategies in Residual Scalar Quantization

Robust Residual Finite Scalar Quantization (RFSQ) introduces two reversible, adaptive conditioning strategies to counter residual magnitude decay: learnable scaling factors and invertible layer normalization.

a. Learnable Scaling Factors

At each quantization stage $r$ , a positive scalar $\alpha_r$ scales the residual: $\tilde x_r = \alpha_r x_r$ The scaled input $\tilde x_r$ is quantized, then rescaled back: $\hat q_r = \mathrm{FSQ}_r(\tilde x_r), \quad \hat x_r = \frac{\hat q_r}{\alpha_r}$ The next residual is $x_{r+1} = x_r - \hat x_r$ . Each $\alpha_r$ is optimized jointly with model parameters during end-to-end training, tuning the residual to align with the FSQ’s quantization grid.

b. Invertible Layer Normalization

For each residual, compute the empirical mean $\mu_r$ and standard deviation $\sigma_r$ : $\mu_r = \tfrac{1}{d}\sum_{i=1}^d (x_r)_i,\quad \sigma_r = \sqrt{\tfrac{1}{d}\sum_{i} ((x_r)_i - \mu_r)^2}$ Apply normalization: $\tilde x_r = \frac{x_r - \mu_r}{\sigma_r}$ Quantize, then invert normalization: $\hat q_r = \mathrm{FSQ}_r(\tilde x_r),\quad \hat x_r = \sigma_r \hat q_r + \mu_r$ Unit variance and zero mean are preserved at each stage, ensuring robust dynamic range and effective sequential quantization. Both strategies retain perfect invertibility up to quantization error (Zhu, 20 Aug 2025).

4. Training Objectives in Adaptive Residual Scalar Quantization

RFSQ is trained within a neural compression pipeline using pixel-level and perceptual objectives. For input image $\mathbf{x}$ and reconstruction $\hat{\mathbf{x}}$ , the loss is: $\mathcal{L} = \|\mathbf{x} - \hat{\mathbf{x}}\|_1 + \mathrm{LPIPS}(\mathbf{x},\hat{\mathbf{x}})$ This loss combines $L_1$ distance for pixel fidelity and the LPIPS perceptual loss. No additional regularizers or codebook utilization terms are required, as FSQ and the adaptive conditioning are fully invertible and do not involve learned codebooks (Zhu, 20 Aug 2025).

5. Empirical Performance and Comparative Results

Comprehensive evaluations on ImageNet at a fixed 12.0-bit rate (4,096 total quantization indices) indicate that RFSQ variants significantly outperform VQ-EMA, FSQ, and LFQ baselines:

Method	Perceptual Loss	$L_1$ Error	PSNR
FSQ	0.182	0.143	20.3 dB
RFSQ-4×1024 (LayerNorm)	0.100	0.102	22.9 dB

These reflect a 45% relative reduction in perceptual loss and a 28.7% reduction in $L_1$ error using the RFSQ-4×1024-LayerNorm configuration. The LayerNorm-based RFSQ consistently yields the best empirical results, with learnable-scaling closely following. All RFSQ configurations outperform non-adaptive multi-stage quantization and VQ baselines (Zhu, 20 Aug 2025).

6. Connections to Adaptive Neural Codebook Residual Quantization

Alternative adaptive residual quantization methods, such as QINCo, introduce per-step neural modulation of codebooks. In QINCo, a scalar residual $r^m$ at step $m$ is quantized using codewords produced by a neural network $f_m$ that conditions on both the partial reconstruction and a fixed base codeword: $c^m_k = f_m([\hat x^m; \bar c^m_k];\theta_m)$ The quantization index $i^m$ is selected as the minimizer of $|r^m - c^m_k|^2$ , and residual reconstruction is performed accordingly (Huijben et al., 26 Jan 2024). This approach, while distinct from RFSQ, also adaptively tailors the quantizer to the local residual distribution, and demonstrates lower MSE than conventional residual quantization.

7. Significance and Broader Impact

Adaptive Residual Scalar Quantization, exemplified by RFSQ with learnable scaling and invertible normalization, addresses the critical magnitude collapse challenge inherent in multi-stage scalar quantization. The result is a fully reversible, training-stable, and plug-and-play quantization framework that maintains low distortion across all quantization stages. Empirical improvements in perceptual and $L_1$ losses, as well as PSNR, confirm the advantage of adaptive conditioning over non-adaptive and vector codebook approaches. These strategies enable the deployment of efficient, stable, and scalable neural compression systems, expanding the practical utility of scalar quantization in both learned and classical settings (Zhu, 20 Aug 2025, Huijben et al., 26 Jan 2024).