Dual-Tower Synergy for Consistent Indexing

Updated 16 December 2025

The paper introduces a dual-view indexing mechanism that aligns query and item embeddings, boosting coarse candidate selection and overall retrieval performance.
It details a methodology where K-means clustering in query space and residual quantization in item space eliminate cross-tower spatial distortions.
Empirical results show significant improvements in recall and ranking metrics on benchmarks like MS-MARCO and real-world e-commerce data.

Consistent Indexing with Dual-Tower Synergy Module (CI) is a framework developed to address limitations in large-scale dense retrieval systems, specifically those stemming from representational misalignment in dual-tower architectures. In conventional dense retrieval, dual-tower models separately encode queries and items into distinct embedding spaces. When such representations are merged within a single retrieval index, the resulting spatial mismatch can degrade matching accuracy, retrieval stability, and negatively impact performance on long-tail queries. The CI module introduces a dual-view indexing scheme that preserves semantic consistency between retrieval stages, integrates tightly with standard hierarchical indexing architectures (e.g., IVF-PQ), and supports billion-scale deployment without additional storage or online computational overhead (Wang et al., 15 Dec 2025).

1. Motivation and Problem Setting

Dense retrieval systems, which have become dominant in large-scale information retrieval due to their efficiency and accuracy, usually employ a coarse-to-fine hierarchical architecture. The prevalent dual-tower structure comprises two separate encoders: $f_q$ for queries and $f_i$ for items, producing embeddings in potentially misaligned spaces. During index construction and retrieval, this asymmetry engenders two primary issues:

Space misalignment: Query and item embeddings are not guaranteed to share geometric consistency, distorting nearest neighbor retrieval.
Index inconsistency: Clustering, residual quantization, and candidate selection may operate across heterogeneous embedding spaces, degrading both recall and ranking metrics.

These alignment issues become increasingly consequential in generative recommendation systems utilizing semantic identifiers, where conflicting geometry between training and inference reduces the capacity and generalization of downstream models (Wang et al., 15 Dec 2025).

2. Dual-View Indexing Strategy

To resolve representational inconsistencies, CI transforms the dual-tower pipeline into a two-view indexing mechanism. Offline, each item $I$ in corpus $\mathcal{D}$ is processed as follows:

Query-tower encoding: $e_I^q = f_q(I)$ , providing a structural vector in the query embedding space.
Item-tower encoding: $e_I^i = f_i(I)$ , yielding a representation vector with potentially enriched, item-specific semantics.

K-means clustering is performed on $\{e_I^q : I \in \mathcal{D}\}$ to determine $K$ centroids $\{c_1, \dots, c_K\}$ in the query space. Every item $I$ is assigned to the nearest centroid $k(I)$ in this space:

$k(I) = \underset{j}{\arg\min}\ \|e_I^q - c_j\|^2$

Residual vectors, representing item-specific detail, are then computed in the item space:

$r_I = e_I^i - c_{k(I)}$

The index (e.g., IVF-PQ) maintains centroids in the query space and per-item product-quantized codes on the item-space residuals. This strictly segregates the structural (query-space) and residual (item-space) aspects of indexing, ensuring that the coarse candidate selection is always aligned with query geometry, while the fine stage leverages item-specific expressivity (Wang et al., 15 Dec 2025).

3. Formalization and Search Procedure

The CI search protocol is defined as follows:

Embeddings: $f_q:\text{Query} \to \mathbb{R}^d$ , $f_i:\text{Item} \to \mathbb{R}^d$
Similarity metric: $S(u,v) = u^\top v$
Clustering: Centroids $c_1$ to $c_K$ derived from query-tower item representations.
Residuals: $r_I = f_i(I) - c_{k(I)}$
Indexing: ANN structures are built with $(c_j)$ as coarse centroids and per-item quantized codes encoding $(r_I)$ .

At query time:

A query $Q$ is embedded via $e_Q = f_q(Q)$ .
The $P$ closest centroids $\{c_{j_p}\}$ to $e_Q$ are selected.
Items indexed under these centroids have their residuals decoded (typically via PQ).
Each candidate item $I$ is scored by:

$\text{score}(I) = -\|e_Q - (c_{k(I)} + \text{PQ.decode}(\text{code}_I))\|^2$

This aligns the initial candidate selection tightly with the learned query geometry, eliminating cross-tower distortions, while the fine-grained step exploits the rich representation of $f_i$ (Wang et al., 15 Dec 2025).

4. Theoretical Consistency

CI’s retrieval consistency theorem establishes that, under the condition that $f_q$ and $f_i$ are well aligned ( $\|f_q(I) - f_i(I)\|$ is small) and $f_q$ 's space is isotropic, the ANN search in $\{f_q(I)\}$ attains the same coarse candidate coverage as the ideal objective $\arg\max_{I \in \mathcal{D}} S(f_q(Q), f_i(I))$ . The argument proceeds as:

If $f_q(I) \approx f_i(I)$ for all $I$ , then $S(f_q(Q), f_i(I)) \approx S(f_q(Q), f_q(I))$ .
For normalized, isotropic embeddings, maximizing $u^\top v$ over $v$ is equivalent to minimizing $\|u - v\|$ .
Clustering and coarse filtering in $f_q$ space produces a quantized, yet consistent, approximation of nearest-neighbor objectives.

A corollary is that using $f_q(I)$ for clustering while retaining $f_i(I)$ for residual quantization preserves semantic consistency and enables finer discrimination between items, leveraging the greater expressiveness of the item tower (Wang et al., 15 Dec 2025).

5. Implementation Workflow and Pseudocode

The end-to-end CI construction and retrieval process can be summarized as:

Step	Offline Index Construction	Online Retrieval
Input	Trained $f_q$ , $f_i$ , corpus $\mathcal{D}$	Query, $f_q$
Encoding	$e_I^q \gets f_q(I)$ , $e_I^i \gets f_i(I)$	$e_Q \gets f_q(Q)$
Clustering	K-means on $\{e_I^q\}$ : $\{c_j\}$	Select top $P$ centroids
Assignment/Residual	$k(I) = \arg\min_j \\|e_I^q - c_j\\|^2$ , $r_I = e_I^i - c_{k(I)}$	As in index
Quantization/Indexing	PQ-encode $r_I$ , assign to list $k(I)$	PQ-decode for candidates
Ranking	Build IVF-PQ with centroids + PQ-codes	$-\\|e_Q - (c_{k(I)} + r_I)\\|^2$

No additional loss is introduced during indexing; the method depends on prior alignment of the towers (e.g., via input swapping in the SymmAligner module). The system is compatible with standard IVF-PQ codebooks and does not incur extra online latency (Wang et al., 15 Dec 2025).

6. Empirical Results and Engineering Considerations

In industrial-scale deployments, the CI module operates with cluster counts $K=4096$ , probe numbers $P=64$ , and PQ code lengths of 64 bytes per item. Storage overhead is unchanged relative to conventional IVF-PQ, as CI reuses cluster centroids in query space and per-item codebooks on item-space residuals. Offline computation requires one additional forward pass of $f_q$ per document, an acceptable cost given offline indexing. Online latency remains identical, with query complexity $O(T_q + P \cdot (D + \text{code-decode}))$ where $D$ is embedding dimension.

The reported empirical enhancements include:

MS-MARCO (nprobe=1): Recall@10 improves from 0.2767 to 0.3268 (approx. 18% relative), MRR@100 from 0.1771 to 0.2157.
MS-MARCO (nprobe=64): MRR@100 increases from 0.4353 to 0.4480.
Production e-commerce (1M items; 10M interactions): Recall@100 rises by 4% relative, NDCG@100 by 9.3% relative after indexing (Wang et al., 15 Dec 2025).

7. Summary and Significance

Consistent Indexing with Dual-Tower Synergy leverages aligned dual-tower embeddings to construct a two-view hierarchical index, where the coarse clustering and candidate selection are performed strictly in query-tower geometry, eliminating cross-tower spatial inconsistencies. The fine stage retains the full representational richness of the item tower, enabling improved recall and ranking without additional inference latency or storage costs. This approach is provably consistent, lightweight for engineering at billion-item scales, and validated by significant empirical improvements across public and industrial datasets (Wang et al., 15 Dec 2025).

PDF Markdown Chat (Pro)

References (1)

A Simple and Effective Framework for Symmetric Consistent Indexing in Large-Scale Dense Retrieval (2025)

Whiteboard

Generate a whiteboard explanation of this topic.

Follow Topic

Get notified by email when new papers are published related to Consistent Indexing with Dual-Tower Synergy Module.