Papers
Topics
Authors
Recent
2000 character limit reached

Adaptive Residual Scalar Quantization

Updated 22 December 2025
  • The paper introduces adaptive conditioning strategies such as learnable scaling and invertible layer normalization to counter residual magnitude decay in multi‐stage scalar quantization.
  • It employs a cascaded framework that refines quantization progressively using finite scalar quantizers combined with straight-through gradient estimators.
  • Empirical results demonstrate significant reductions in perceptual loss and L1 error, outperforming traditional quantization methods in neural compression tasks.

Adaptive Residual Scalar Quantization (ARSQ) refers to a class of quantization methods in which a signal is quantized using multiple sequential scalar quantizers, with each stage quantizing the residual error of the previous one. Critically, adaptive mechanisms—such as stage-wise learnable scaling or normalization—are introduced to address dynamic range mismatch in the residuals, preserving quantization effectiveness throughout all stages. ARSQ achieves improved rate-distortion tradeoffs and stability compared to traditional vector quantization and non-adaptive multi-stage scalar quantization, and has demonstrated robust performance in neural compression pipelines and large-scale retrieval settings (Zhu, 20 Aug 2025, Huijben et al., 26 Jan 2024).

1. Finite Scalar Quantization and Multi-Stage Residual Frameworks

Finite Scalar Quantization (FSQ) is defined by mapping each real-valued coordinate independently to one of LiL_i uniformly spaced levels in [1,1][-1,1]. For scalar zi[1,1]z_i \in [-1,1], the quantization operator is: FSQi(zi)=round(zi(Li1)2)×2Li1\mathrm{FSQ}_i(z_i) = \mathrm{round}\Bigl(\tfrac{z_i (L_i-1)}{2}\Bigr) \times \tfrac{2}{L_i-1} where the quantized vector in dd-dimensions is q=[FSQ1(z1),,FSQd(zd)]\mathbf{q} = [\mathrm{FSQ}_1(z_1),\dots,\mathrm{FSQ}_d(z_d)]. To facilitate gradient-based training, the straight-through estimator with an identity Jacobian is used during backpropagation.

Residual quantization extends scalar quantization by cascading RR quantization stages. The rr-th stage quantizes input xrx_r to obtain approximation x^r=FSQr(xr)\hat x_r = \mathrm{FSQ}_r(x_r), and propagates the residual xr+1=xrx^rx_{r+1} = x_r - \hat x_r. The total reconstruction is given by the sum x^=r=1Rx^r\hat x = \sum_{r=1}^R \hat x_r (Zhu, 20 Aug 2025). For scalar or vector xx, this framework enables progressive refinement of the quantized representation, as in classical residual quantization methods (Huijben et al., 26 Jan 2024).

2. Residual Magnitude Decay and Its Consequences

In conventional multi-stage FSQ or residual quantization, a severe residual magnitude decay problem arises. Each successive residual xr+1x_{r+1} has norm

xr+1=xrFSQr(xr)xr\|x_{r+1}\| = \|x_r - \mathrm{FSQ}_r(x_r)\| \ll \|x_r\|

and empirically, this decay is close to exponential: xrϵr1x1\|x_r\| \approx \epsilon^{r-1} \|x_1\| for some ϵ<1\epsilon < 1 (Zhu, 20 Aug 2025). This collapse in residual dynamic range deprives later stages of meaningful signal, leading to ineffective quantization in higher-order stages and suboptimal rate-distortion performance.

3. Adaptive Conditioning Strategies in Residual Scalar Quantization

Robust Residual Finite Scalar Quantization (RFSQ) introduces two reversible, adaptive conditioning strategies to counter residual magnitude decay: learnable scaling factors and invertible layer normalization.

a. Learnable Scaling Factors

At each quantization stage rr, a positive scalar αr\alpha_r scales the residual: x~r=αrxr\tilde x_r = \alpha_r x_r The scaled input x~r\tilde x_r is quantized, then rescaled back: q^r=FSQr(x~r),x^r=q^rαr\hat q_r = \mathrm{FSQ}_r(\tilde x_r), \quad \hat x_r = \frac{\hat q_r}{\alpha_r} The next residual is xr+1=xrx^rx_{r+1} = x_r - \hat x_r. Each αr\alpha_r is optimized jointly with model parameters during end-to-end training, tuning the residual to align with the FSQ’s quantization grid.

b. Invertible Layer Normalization

For each residual, compute the empirical mean μr\mu_r and standard deviation σr\sigma_r: μr=1di=1d(xr)i,σr=1di((xr)iμr)2\mu_r = \tfrac{1}{d}\sum_{i=1}^d (x_r)_i,\quad \sigma_r = \sqrt{\tfrac{1}{d}\sum_{i} ((x_r)_i - \mu_r)^2} Apply normalization: x~r=xrμrσr\tilde x_r = \frac{x_r - \mu_r}{\sigma_r} Quantize, then invert normalization: q^r=FSQr(x~r),x^r=σrq^r+μr\hat q_r = \mathrm{FSQ}_r(\tilde x_r),\quad \hat x_r = \sigma_r \hat q_r + \mu_r Unit variance and zero mean are preserved at each stage, ensuring robust dynamic range and effective sequential quantization. Both strategies retain perfect invertibility up to quantization error (Zhu, 20 Aug 2025).

4. Training Objectives in Adaptive Residual Scalar Quantization

RFSQ is trained within a neural compression pipeline using pixel-level and perceptual objectives. For input image x\mathbf{x} and reconstruction x^\hat{\mathbf{x}}, the loss is: L=xx^1+LPIPS(x,x^)\mathcal{L} = \|\mathbf{x} - \hat{\mathbf{x}}\|_1 + \mathrm{LPIPS}(\mathbf{x},\hat{\mathbf{x}}) This loss combines L1L_1 distance for pixel fidelity and the LPIPS perceptual loss. No additional regularizers or codebook utilization terms are required, as FSQ and the adaptive conditioning are fully invertible and do not involve learned codebooks (Zhu, 20 Aug 2025).

5. Empirical Performance and Comparative Results

Comprehensive evaluations on ImageNet at a fixed 12.0-bit rate (4,096 total quantization indices) indicate that RFSQ variants significantly outperform VQ-EMA, FSQ, and LFQ baselines:

Method Perceptual Loss L1L_1 Error PSNR
FSQ 0.182 0.143 20.3 dB
RFSQ-4×1024 (LayerNorm) 0.100 0.102 22.9 dB

These reflect a 45% relative reduction in perceptual loss and a 28.7% reduction in L1L_1 error using the RFSQ-4×1024-LayerNorm configuration. The LayerNorm-based RFSQ consistently yields the best empirical results, with learnable-scaling closely following. All RFSQ configurations outperform non-adaptive multi-stage quantization and VQ baselines (Zhu, 20 Aug 2025).

6. Connections to Adaptive Neural Codebook Residual Quantization

Alternative adaptive residual quantization methods, such as QINCo, introduce per-step neural modulation of codebooks. In QINCo, a scalar residual rmr^m at step mm is quantized using codewords produced by a neural network fmf_m that conditions on both the partial reconstruction and a fixed base codeword: ckm=fm([x^m;cˉkm];θm)c^m_k = f_m([\hat x^m; \bar c^m_k];\theta_m) The quantization index imi^m is selected as the minimizer of rmckm2|r^m - c^m_k|^2, and residual reconstruction is performed accordingly (Huijben et al., 26 Jan 2024). This approach, while distinct from RFSQ, also adaptively tailors the quantizer to the local residual distribution, and demonstrates lower MSE than conventional residual quantization.

7. Significance and Broader Impact

Adaptive Residual Scalar Quantization, exemplified by RFSQ with learnable scaling and invertible normalization, addresses the critical magnitude collapse challenge inherent in multi-stage scalar quantization. The result is a fully reversible, training-stable, and plug-and-play quantization framework that maintains low distortion across all quantization stages. Empirical improvements in perceptual and L1L_1 losses, as well as PSNR, confirm the advantage of adaptive conditioning over non-adaptive and vector codebook approaches. These strategies enable the deployment of efficient, stable, and scalable neural compression systems, expanding the practical utility of scalar quantization in both learned and classical settings (Zhu, 20 Aug 2025, Huijben et al., 26 Jan 2024).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to Adaptive Residual Scalar Quantization.