Centroid–Residual Quantization in ANN Search

Updated 30 December 2025

Centroid–Residual Quantization is a hierarchical vector quantization method that decomposes a data vector into a coarse centroid and a quantized residual.
An enhanced variant, Transformed Residual Quantization, applies per-cluster orthogonal transformations to align residuals and reduce quantization error by up to 50% in some cases.
This approach significantly improves approximate nearest neighbor search performance by reducing storage requirements and computational complexity compared to traditional methods.

Centroid–Residual Quantization, often referred to as Residual Quantization (RQ), is a hierarchical vector quantization strategy that approximates a data vector as the sum of a coarse centroid and a quantized residual. This two-stage quantizer has found substantial application in large-scale approximate nearest neighbor (ANN) search, where it achieves efficient representational compression and reduced computational complexity. An enhanced variant, Transformed Residual Quantization (TRQ), introduces per-cluster linear transformations—restricted to orthogonal matrices—to further align residuals, thereby minimizing quantization error and improving retrieval performance. Both models provide natural extensions and direct replacements for Product Quantization (PQ), yielding exponential complexity reductions in codebook size for both storage and computation (Yuan et al., 2015).

1. Formal Structure of Two-Stage Residual Quantization

Given a dataset $X = \{x_i\}_{i=1}^N \subset \mathbb{R}^D$ , RQ first partitions the data using a coarse codebook $C^{(1)} = \{c^{(1)}_1, \ldots, c^{(1)}_{K_1}\} \subset \mathbb{R}^D$ . Each vector $x_i$ is assigned to its nearest centroid via

$k(i) = \arg\min_{k=1\ldots K_1} \|x_i - c^{(1)}_k\|_2^2,$

with the first-stage reproduction $q_1(x_i) = c^{(1)}_{k(i)}$ . The residual vector is defined as $r_i = x_i - c^{(1)}_{k(i)}$ . The second-stage codebook $C^{(2)} = \{c^{(2)}_1, \ldots, c^{(2)}_{K_2}\}$ is learned by applying k-means clustering to the collection of residuals $\{r_i\}$ . Each residual $r_i$ is then assigned:

$l(i) = \arg\min_{\ell=1\ldots K_2} \|r_i - c^{(2)}_\ell\|_2^2, \quad q_2(r_i) = c^{(2)}_{l(i)}.$

The complete two-stage quantizer reconstructs

$Q(x_i) = c^{(1)}_{k(i)} + c^{(2)}_{l(i)},$

minimizing the mean squared error (MSE)

$\mathrm{MSE}_{\mathrm{RQ}} = \frac{1}{N} \sum_{i=1}^N \|x_i - Q(x_i)\|_2^2 = \frac{1}{N} \sum_{i=1}^N \|r_i - c^{(2)}_{l(i)}\|_2^2.$

2. Transformed Residual Quantization: Objective and Model Enhancement

In ordinary RQ, the residuals from each first-stage cluster generally exhibit heterogeneous orientations and scales. TRQ addresses this by learning a cluster-specific orthogonal transformation $T_k$ for each residual cluster. The representation becomes

$Q_{\mathrm{TRQ}}(x_i) = c^{(1)}_{k(i)} + T_{k(i)} r_i + c^{(2)}_{l(i)},$

with each $T_{k(i)}$ constrained to be orthogonal: $T_k^\top T_k = I$ for all $k$ .

The joint minimization objective for the first-stage codebook $C^{(1)}$ , the second-stage codebook $C^{(2)}$ , and the transforms $\{T_k\}$ is

$\min_{C^{(1)}, C^{(2)}, \{T_k\}} \sum_{i=1}^N \|x_i - c^{(1)}_{k(i)} - T_{k(i)}(x_i - c^{(1)}_{k(i)}) - c^{(2)}_{l(i)} \|_2^2,$

subject to $T_k^\top T_k = I$ for each $k$ .

3. Alternating Optimization and Training Procedure

TRQ optimization employs block-coordinate descent with two alternating steps:

a. Codebook Update:

Fix the transformations $\{T_k\}$ and update $C^{(2)}$ and assignments $\{l(i)\}$ . For each residual cluster $k$ , one computes the transformed residuals $V_k' = \{T_k r_i : k(i) = k\}$ , pools across clusters, and applies k-means (or a product quantizer) to obtain $C^{(2)}$ . Residuals are assigned to their nearest second-stage centroids.

b. Transform Update:

Fix $C^{(2)}$ and the assignments, and update each $T_k$ by solving an orthogonal Procrustes problem. Let $V_k$ be the $D \times |V_k|$ matrix of cluster $k$ 's residuals and $\widehat W_k$ their corresponding second-stage reconstructions. The update is:

$T_k = \arg\min_{\Omega^\top \Omega = I} \|\Omega V_k - \widehat W_k\|_F^2$

This is solved via SVD of the cross-covariance $M_k = \widehat W_k V_k^\top = U_k \Sigma_k V_k^\top$ , giving $T_k = V_k U_k^\top$ .

A few dozen iterations typically suffice for convergence in practice (Yuan et al., 2015).

4. Quantization Error and Empirical Results

Quantization error in TRQ and its predecessors is measured via MSE:

Ordinary RQ:

$\mathrm{MSE}_{\mathrm{RQ}} = \frac{1}{N} \sum_i \| r_i - c^{(2)}_{l(i)} \|^2$

TRQ:

$\mathrm{MSE}_{\mathrm{TRQ}} = \frac{1}{N} \sum_i \| T_{k(i)} r_i - c^{(2)}_{l(i)} \|^2$

where orthogonality of $T_k$ ensures $\|r_i - c\| = \|T_k r_i - T_k c\|$ but with markedly improved codebook alignment.

Empirical results indicate substantial error reductions for TRQ versus optimized product quantization (OPQ): on SIFT1M, MSE is reduced by approximately 25%, and on MNIST reductions reach up to 50%. GIST1M, whose features are already close to isotropic, sees a milder MSE improvement of around 10% (Yuan et al., 2015).

5. Application to Large-Scale Approximate Nearest Neighbor Search

TRQ demonstrates significant performance enhancements in ANN search, particularly with inverted-index search frameworks. Because query-time cost is dominated by the number of first-stage cells visited, the additional cost of evaluating a small number of $D \times D$ orthogonal projections (one per active cluster) is minimal.

On the SIFT1B benchmark with 16-byte codes and recall@1 measured at shortlist sizes $T = 10,000$ and $T = 30,000$ , results included:

OPQ: R@1 = 0.359 (T = 10,000)
TRQ: R@1 = 0.426 (+8%)
OPQ: R@1 = 0.379 (T = 30,000)
TRQ: R@1 = 0.446 (+7%)

Similar gains appear for recall@10 and recall@50. On medium-scale datasets (SIFT1M, GIST1M, MNIST), TRQ increases Recall@1 by 5–10 percentage points over OPQ and by 7–12 points over vanilla PQ (Yuan et al., 2015).

6. Comparative Significance and Interpretations

The critical advance of TRQ is the explicit per-cluster alignment of residual distributions prior to the second-stage quantization, realized via orthogonal transformations that enable more effective codebook partitioning. This produces substantially lower quantization error and commensurate improvements in ANN search recall, especially where residuals are anisotropic across clusters.

The improvement magnitude is contingent on the structure of residual spaces: greater diversity in residual orientation or scale across clusters favors TRQ. On datasets where intrinsic isotropy is high, impact is less pronounced. A plausible implication is that further gains may be possible by hybridizing with or extending other vector quantization techniques, particularly in highly structured data environments (Yuan et al., 2015).

PDF Markdown Chat (Pro)

References (1)

Transformed Residual Quantization for Approximate Nearest Neighbor Search (2015)

Whiteboard

Generate a whiteboard explanation of this topic.

Topic to Video (Beta)

Generate a video overview of this topic.

Follow Topic

Get notified by email when new papers are published related to Centroid–Residual Quantization.