Papers
Topics
Authors
Recent
Search
2000 character limit reached

Residual Neural Quantization (QINCo/QINCo2)

Updated 20 March 2026
  • Residual Neural Quantization (QINCo/QINCo2) is a data-adaptive approach that replaces static codebooks with neural networks producing implicit centroids conditioned on past quantizations.
  • The method refines centroids through learned corrective terms and cross-stage attention, significantly improving rate-distortion performance and operational efficiency.
  • Its integration in applications like neural codecs and billion-scale ANN search demonstrates practical gains in reconstruction error reduction and search recall enhancement.

Residual Neural Quantization (QINCo/QINCo2) encompasses a family of data-adaptive, multi-stage quantization methods that replace the static codebooks of classical residual quantization (RQ) with small neural networks producing “implicit” codebooks conditioned on the quantized history. These methods achieve state-of-the-art performance for vector compression, approximate nearest neighbor (ANN) search, and neural codec design, notably advancing both rate-distortion and operational efficiency at scale.

1. Mathematical Formulation of Residual Neural Quantization

Let xRDx \in \mathbb{R}^D be a target vector to be quantized using NN sequential codebooks. Classical RQ represents xx as the sum x^N+1=n=1Ncnkn\hat{x}_{N+1} = \sum_{n=1}^{N} c_n^{k_n}, where each cnknc_n^{k_n} is a centroid selected from a fixed codebook Cn={cn1,,cnK}C_n = \{c_n^1,\dots,c_n^K\} at stage nn based on the current residual rn=xx^nr_n=x-\hat{x}_n: kn=argmink{1,,K}rncnk22,k_n = \arg\min_{k\in\{1,\dots,K\}} \| r_n - c_n^k \|_2^2,

x^n+1=x^n+cnkn.\hat{x}_{n+1} = \hat{x}_n + c_n^{k_n}.

In QINCo/QINCo2, each centroid is produced by a neural network fθnf_{\theta_n}, conditioned on the intermediate reconstruction x^n\hat{x}_n and a base centroid cˉnk\bar{c}_n^k: cnk=fθn(x^n,cˉnk).c_n^k = f_{\theta_n}\left(\hat{x}_n, \bar{c}_n^k\right). The selection rule is retained, but the centroid itself is adaptive: kn=argminkrncnk22,k_n = \arg\min_{k} \| r_n - c_n^k \|_2^2,

x^n+1=x^n+cnkn.\hat{x}_{n+1} = \hat{x}_n + c_n^{k_n}.

This neuralization of codebooks transforms residual quantization into a parameter-rich, data-dependent process, where the codebook at each stage is implicitly a function of the quantization path and the input statistics (Huijben et al., 2024, Vallaeys et al., 6 Jan 2025, Lahrichi et al., 19 Mar 2025).

2. Implicit Neural Codebooks: Architecture and Parameterization

The implicit codebook for stage nn is generated as follows:

  • Each cˉnkRD\bar{c}_n^k \in \mathbb{R}^D is a base centroid (typically from k-means) and is fixed or fine-tuned.
  • The neural network fθn()f_{\theta_n}(\cdot) receives [x^n;cˉnk]R2D[\hat{x}_n;\bar{c}_n^k] \in \mathbb{R}^{2D} as input and outputs a refined centroid.
  • The architecture for fθnf_{\theta_n} consists of:
    • An initial affine layer projecting from R2DRdh\mathbb{R}^{2D} \rightarrow \mathbb{R}^{d_h}.
    • LL residual MLP blocks (width dhd_h).
    • A final affine projection dhDd_h \rightarrow D.
    • A skip connection ensures fθn(x^n,cˉnk)=cˉnk+h(x^n,cˉnk)f_{\theta_n}(\hat{x}_n,\bar{c}_n^k) = \bar{c}_n^k + h(\hat{x}_n,\bar{c}_n^k), so the network learns a corrective term.

QINCo2 further refines this blueprint by:

  • Sharing/tying parameters across stages to lower memory cost.
  • Incorporating cross-stage attention, so each stage’s MLP can depend on all prior residuals, not only x^n\hat{x}_n.
  • Improved base centroid initialization and warm-starts.
  • Enhanced lookup speed through efficient architecture design (Vallaeys et al., 6 Jan 2025, Lahrichi et al., 19 Mar 2025).

3. Training Objectives and Algorithms

For end-to-end training, QINCo and QINCo2 optimize the sum-squared quantization error across all residual stages over all training samples: Lquant=xXn=1Nrn(x)cnkn(x)22.\mathcal{L}_{\mathrm{quant}} = \sum_{x \in X} \sum_{n=1}^{N} \| r_n(x) - c_n^{k_n(x)} \|_2^2. Key points:

  • The quantization indices knk_n are obtained via hard nearest-neighbor search; no commitment loss or straight-through estimator is required.
  • During training, gradients are backpropagated only into the parameters of fθnf_{\theta_n} (and optionally the base centroids).
  • In pipeline applications such as neural codecs (QINCODEC), encoder and quantizer are frozen when the decoder is fine-tuned; only the decoder receives gradients (Lahrichi et al., 19 Mar 2025).

The greedy (or beam) search during encoding selects, at each stage, the index knk_n minimizing the distortion to the current residual; decoding reconstructs xx from the sum of the selected refined centroids. In QINCo2, beam search with codeword pre-selection is used: a lightweight scorer gϕng_{\phi_n} pre-selects a subset of candidates to minimize evaluation cost, and a beam of width BB maintains multiple partial reconstructions for improved accuracy at higher computational load (Vallaeys et al., 6 Jan 2025).

4. Pipeline Integration and Applications

QINCo/QINCo2 integrate directly into modular pipelines such as QINCODEC for neural audio compression (Lahrichi et al., 19 Mar 2025):

  1. Autoencoder pretraining: A continuous autoencoder is trained on the raw domain (e.g., waveforms for audio) with spectral and adversarial losses, no quantization bottleneck.
  2. Offline quantizer training: The pre-trained encoder produces a large set of latent representations. The QINCo2 quantizer is fit on this dataset, selecting the number of stages NN and codebook size KK to meet target bitrate via bits/sec=FNlog2Kbits/sec = F \cdot N \cdot \log_2 K, where FF is frame-rate.
  3. Decoder fine-tuning: With encoder and quantizer fixed, the decoder is fine-tuned using quantized latents. This stage restores fidelity lost to quantization, training only the decoder and discriminator.

In large-scale vector search, the QINCo2 encoding and decoding steps are adapted for billion-scale nearest neighbor indices. To expedite decoding for retrieval, pairwise additive decoders are trained on pairs of indices, approximating the neural decoded vector as a sum of a small set of table-lookup terms with minimal accuracy loss (Vallaeys et al., 6 Jan 2025).

5. Empirical Results and Impact

QINCo2 consistently improves rate–distortion performance, ANN search recall, and codebook utilization compared to previous methods. Representative results (Vallaeys et al., 6 Jan 2025, Lahrichi et al., 19 Mar 2025):

Dataset Code Metric RQ QINCo QINCo2
BigANN1M 16 B/v MSE × 1e-4 1.30 0.32 0.18
Deep1M 8 B/v Recall@1 (%) 21.4 36.3 45.1
QINCODEC (audio) 16 kbps Si-SDR (dB) 6.09 7.22
MS-Mel 0.96 0.79
  • On vector datasets, QINCo2 reduces reconstruction MSE by up to 34% over QINCo and raises Recall@1 by 24% for high-compression settings.
  • On audio, replacing RVQ with QINCo2 in QINCODEC increases reconstructed SDR by ~1 dB and decreases mel error by ~0.2 across bitrates, with higher codebook perplexity indicating better codebook usage.
  • QINCo2’s improvements are robust to overparameterization, larger data regimes, and various modality types (vision, speech, text embeddings) (Vallaeys et al., 6 Jan 2025, Lahrichi et al., 19 Mar 2025).

6. Extensions, Limitations, and Practical Considerations

Extensions:

  • QINCo2’s architecture scales to dynamic multi-rate compression by truncating the number of residual stages at decode-time with little MSE or recall loss.
  • Further gains are observed by extending from pairwise to higher-order combinatorial decoders, though at increased storage and training complexity.

Operational Trade-offs:

  • Encoding speed is necessarily slower than fixed RQ due to network evaluations, but pre-selection and beam search trade accuracy for speed.
  • Decoding via the full neural stack takes microseconds, but pairwise additive decoders enable near-classical lookup efficiency with minimal distortion loss (Vallaeys et al., 6 Jan 2025).

Limitations and Open Challenges:

  • Optimal design of pre-selection/beam strategies for variable-bitrate and low-power scenarios remains active research.
  • Memory footprint of large, adaptive neural codebooks must be balanced against accuracy, especially for extremely high-dimensional data.
  • Extending implicit neural codebooks to triplet- or higher-order recombinations could yield further accuracy, but increases index complexity and training overhead.

A plausible implication is that neural residual quantizers with implicit codebooks will increasingly supplant fixed-codebook baselines in high-fidelity compression and billion-scale retrieval, particularly as modeling and hardware advances further reduce inference and decode costs.

7. Relation to High-Order Residual Quantization in Network Acceleration

The QINCo2 methodology is closely related to high-order residual quantization (HORQ) in network binarization (Li et al., 2017). In HORQ, any vector xRnx \in \mathbb{R}^n is approximated by a sum of KK signed binary vectors with decreasing residual energy: xi=1Kαibi,x \approx \sum_{i=1}^K \alpha_i b_i, with recursive updates Ri(x)=Ri1(x)αibiR_i(x) = R_{i-1}(x) - \alpha_i b_i, where bi=sign(Ri1(x))b_i = \operatorname{sign}(R_{i-1}(x)) and αi=1nRi1(x)1\alpha_i = \frac{1}{n} \| R_{i-1}(x) \|_1.

QINCo2 generalizes this approach from binarized settings to vector quantization with data-adaptive neural codebooks, retaining the greedy residual structure but introducing complex, context-dependent centroids and learned mapping functions. In the context of neural network acceleration, the same principle enables layers that operate with multiple binary maps, providing improved accuracy–speed trade-offs by capturing more residual detail with each quantization step (Li et al., 2017).


The development of QINCo and QINCo2 represents a major advance in residual vector quantization by introducing flexible, data-conditioned, and tractable codebooks, leading to superior empirical performance in both lossy compression and approximate search (Huijben et al., 2024, Vallaeys et al., 6 Jan 2025, Lahrichi et al., 19 Mar 2025).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Residual Neural Quantization (QINCo/QINCo2).