RQ-OPQ: Hierarchical Quantization for Recommendations

Updated 16 October 2025

RQ-OPQ encoding is a hierarchical framework that combines Residual Quantization and Optimized Product Quantization to capture both collaborative and item-specific features.
It employs a multi-layer semantic ID stack with three RQ stages and two OPQ stages, using adaptive fusion and contrastive learning to align feature embeddings.
Empirical results demonstrate improvements in AUC, GAUC, buyer engagement, and order conversion, especially enhancing performance for cold-start items.

RQ-OPQ encoding refers to a hierarchical vector quantization framework that fuses Residual Quantization (RQ) with Optimized Product Quantization (OPQ) to yield semantic representations that capture both collaborative and item-specific information. Recent applications, notably in large-scale e-commerce recommendation (Zhao et al., 14 Oct 2025), leverage this encoding for cold-start scenarios by aligning collaborative signals and fine-grained item features through multi-layer semantic IDs. The method is also related to advanced neural residual quantization techniques (Vallaeys et al., 6 Jan 2025), quantum data encoding topologies (Vlasic et al., 2022, Zang et al., 20 May 2025), and Unequal Error Protection (UEP) mechanisms in transmission codes (Elliadka et al., 2014).

1. Foundations of RQ-OPQ Encoding

RQ-OPQ encoding combines the complementary strengths of Residual Quantization and Optimized Product Quantization. In RQ, the quantization process is structured hierarchically: the data vector is first approximated by a codeword in the initial codebook; the residual error vector is then quantized in subsequent layers, recursively decomposing the signal. OPQ enhances product quantization by finding an orthogonal transformation of the input space that minimizes quantization error given a fixed number of partitions.

The central construction in SMILE (Zhao et al., 14 Oct 2025) is a five-stage semantic ID (SID):

$I_{sid} = (\text{RQ}_1, \text{RQ}_2, \text{RQ}_3, \text{OPQ}_1, \text{OPQ}_2)$

Here, $\text{RQ}_1$ , $\text{RQ}_2$ , and $\text{RQ}_3$ (obtained via RQ-Kmeans) encode shared semantic and collaborative item information, while $\text{OPQ}_1$ and $\text{OPQ}_2$ capture discriminative, fine-grained content from the residuals.

2. Encoding Pipeline and Mathematical Formulation

The encoding pipeline in SMILE proceeds as follows:

Initial Embedding: Generate a feature embedding via a two-tower model drawing from both item content and conversion (behavioral) data.
Hierarchical RQ-Kmeans Encoding: Quantize the embedding using three hierarchical RQ codebooks:

$I_{emb}^{RQ} = \operatorname{Fuse}(\text{RQ}_1, \text{RQ}_2, \text{RQ}_3)$

Residual OPQ Encoding: Quantize the residuals with two OPQ codebooks, yielding

$I_{sid} = (\text{RQ}_1, \text{RQ}_2, \text{RQ}_3, \text{OPQ}_1, \text{OPQ}_2)$

Adaptive Transfer Fusion: Fuse collaborative (ID) and semantic (RQ) embeddings using an adaptive gate:

$T_g = \text{DNN}(C, U_p, X)$

$I_{emb}^c = T_g \cdot I_{emb}^{id} + (1 - T_g) \cdot I_{emb}^{RQ}$

Supervised Alignment and Contrastive Learning: Employ transfer and InfoNCE losses:

$\begin{align*} L_{\text{trans}} &= T_g \cdot KL(\text{sg}(I_{emb}^{id}), I_{emb}^{RQ}) + (1-T_g) \cdot KL(I_{emb}^{id}, \text{sg}(I_{emb}^{RQ})) \ L_{\text{cont}} &= -\log \left( \frac{\sum \exp(\operatorname{sim}(I^i, I^{i+})/\tau)}{\sum \exp(\operatorname{sim}(I^i, I^{i+})/\tau) + \sum \exp(\operatorname{sim}(I^i, I^{i-})/\tau)} \right) \end{align*}$

Final Embedding: The refined item representation is

$I_{emb}^f = I_{emb}^c + \lambda \cdot I_{emb}^{OPQ}$

The overall objective function is

$L_{total} = L_{BCE} + \alpha_1 L_{trans} + \alpha_2 L_{cont}$

3. Practical Applications and Empirical Performance

RQ-OPQ encoding is specifically designed to address the cold-start problem in e-commerce search and recommendation, where new items lack historical interaction signals. The multi-stage SID constructed through RQ-OPQ enables transfer of collaborative signals and injection of fine-grained content via contrastive learning.

Empirical results in SMILE (Zhao et al., 14 Oct 2025) include:

Offline: In full-sample evaluations, SMILE achieves AUC $0.8725$ and GAUC $0.6394$, surpassing SPM_SID, DAS, and SaviorRec by $0.38$ and $0.34$ percentage points respectively.
Cold-Start: Gains are more salient, with cold-item AUC rising from approx. $0.843$ to $0.8528$.
Online A/B Testing: Over 14 days, buyer metrics increase $+1.72\%$ and orders $+2.23\%$ overall; in the cold-start segment, buyers surge $+3.512\%$ and orders $+9.639\%$ .

4. Relationships with Advanced Vector Quantization Methods

The RQ-OPQ scheme is structurally similar to recent advances in neural residual quantization. QINCo2 (Vallaeys et al., 6 Jan 2025), for example, replaces static codebooks with conditioned neural codebooks and utilizes candidate pre-selection and beam search for encoding efficiency. It further employs a lookup-based pairwise decoder for scalable search, yielding an MSE reduction of $34\%$ (QINCo2: $0.19 \times 10^{-4}$ vs. QINCo: $0.32 \times 10^{-4}$ ) and a $24\%$ improvement in recall@1 for top- $k$ search.

RQ-OPQ's layered quantization and adaptive fusion mechanisms resonate with the goal of maximizing rate-distortion tradeoffs under resource constraints. The approach also aligns with Norm-Explicit Quantization (Dai et al., 2019), which separately quantizes norms and directions to enhance inner product similarity search.

5. Connections to Quantum Data Encoding and Topological Analysis

While RQ-OPQ encoding in SMILE is not implemented as a quantum circuit, related literature (Vlasic et al., 2022, Zang et al., 20 May 2025) investigates quantum analogs of encoding strategies. Quantum machine learning requires data transformation into quantum states via feature maps (such as angle encoding, amplitude encoding, or IQP). These mappings affect the topology of embedded data, impacting clusterability and expressiveness.

Quantum topological analysis (qTDA) methods illustrated in (Vlasic et al., 2022) reveal that encoding methods alter the persistent Betti numbers and homological structure of the data in Hilbert space. A plausible implication is that a hierarchical and discriminative encoding like RQ-OPQ could, if ported to quantum settings, influence the complexity and separability of clusters, with potential ramifications for quantum-enhanced recommendation models.

6. Implications, Limitations, and Future Research Directions

The adoption of RQ-OPQ encoding in SMILE reveals several methodological advances and open questions:

Multimodal Extension: Integrating image, text, and contextual features into the semantic ID stack could enrich item representations.
Adaptive Transfer Mechanism: Further work may improve the self-adjustment capability of the fusion gate (parameter $T_g$ ).
Real-Time and Scalability: Scaling to dynamic, large-catalog environments necessitates refinement in computational efficiency, especially as collaborative signals and item features evolve.
Contrastive Learning Sensitivity: Managing the selection of top- $k$ positives and negatives is crucial for contrastive loss stability, particularly in non-stationary data environments.
Benchmarking in Quantum ML: While RQ-OPQ encoding shares conceptual similarities with advanced quantum embedding schemes, direct translation and benchmarking in QML require systematic experimental evaluation on comparable datasets.

7. Comparison with UEP Transmission Codes

The methodology of RQ-OPQ encoding may inform UEP schemes such as Priority Based Precode Ratio (PBPR) for RaptorQ (Elliadka et al., 2014), where error protection is tuned by adapting the number of precode symbols for different data priorities. PBPR achieves up to $2\times$ improvement in correction capability for high-priority data with modest overhead in encoding/decoding times. This analogy suggests that similar stratification of protection or representation can be fruitful in both transmission and representation learning contexts.

RQ-OPQ encoding, exemplified in SMILE, represents a multi-layer quantization approach that fuses collaborative and content signals, outperforming previous e-commerce representation methods especially for cold-start items. It draws from advances in neural quantization and finds conceptual resonance in quantum data encoding and transmission code UEP mechanisms. Future research will likely expand its application domain, optimize its adaptive fusion, and examine its role in multimodal and quantum learning settings.