Papers
Topics
Authors
Recent
2000 character limit reached

Dual-Tower Synergy for Consistent Indexing

Updated 16 December 2025
  • The paper introduces a dual-view indexing mechanism that aligns query and item embeddings, boosting coarse candidate selection and overall retrieval performance.
  • It details a methodology where K-means clustering in query space and residual quantization in item space eliminate cross-tower spatial distortions.
  • Empirical results show significant improvements in recall and ranking metrics on benchmarks like MS-MARCO and real-world e-commerce data.

Consistent Indexing with Dual-Tower Synergy Module (CI) is a framework developed to address limitations in large-scale dense retrieval systems, specifically those stemming from representational misalignment in dual-tower architectures. In conventional dense retrieval, dual-tower models separately encode queries and items into distinct embedding spaces. When such representations are merged within a single retrieval index, the resulting spatial mismatch can degrade matching accuracy, retrieval stability, and negatively impact performance on long-tail queries. The CI module introduces a dual-view indexing scheme that preserves semantic consistency between retrieval stages, integrates tightly with standard hierarchical indexing architectures (e.g., IVF-PQ), and supports billion-scale deployment without additional storage or online computational overhead (Wang et al., 15 Dec 2025).

1. Motivation and Problem Setting

Dense retrieval systems, which have become dominant in large-scale information retrieval due to their efficiency and accuracy, usually employ a coarse-to-fine hierarchical architecture. The prevalent dual-tower structure comprises two separate encoders: fqf_q for queries and fif_i for items, producing embeddings in potentially misaligned spaces. During index construction and retrieval, this asymmetry engenders two primary issues:

  • Space misalignment: Query and item embeddings are not guaranteed to share geometric consistency, distorting nearest neighbor retrieval.
  • Index inconsistency: Clustering, residual quantization, and candidate selection may operate across heterogeneous embedding spaces, degrading both recall and ranking metrics.

These alignment issues become increasingly consequential in generative recommendation systems utilizing semantic identifiers, where conflicting geometry between training and inference reduces the capacity and generalization of downstream models (Wang et al., 15 Dec 2025).

2. Dual-View Indexing Strategy

To resolve representational inconsistencies, CI transforms the dual-tower pipeline into a two-view indexing mechanism. Offline, each item II in corpus D\mathcal{D} is processed as follows:

  • Query-tower encoding: eIq=fq(I)e_I^q = f_q(I), providing a structural vector in the query embedding space.
  • Item-tower encoding: eIi=fi(I)e_I^i = f_i(I), yielding a representation vector with potentially enriched, item-specific semantics.

K-means clustering is performed on {eIq:ID}\{e_I^q : I \in \mathcal{D}\} to determine KK centroids {c1,,cK}\{c_1, \dots, c_K\} in the query space. Every item II is assigned to the nearest centroid k(I)k(I) in this space:

k(I)=argminj eIqcj2k(I) = \underset{j}{\arg\min}\ \|e_I^q - c_j\|^2

Residual vectors, representing item-specific detail, are then computed in the item space:

rI=eIick(I)r_I = e_I^i - c_{k(I)}

The index (e.g., IVF-PQ) maintains centroids in the query space and per-item product-quantized codes on the item-space residuals. This strictly segregates the structural (query-space) and residual (item-space) aspects of indexing, ensuring that the coarse candidate selection is always aligned with query geometry, while the fine stage leverages item-specific expressivity (Wang et al., 15 Dec 2025).

3. Formalization and Search Procedure

The CI search protocol is defined as follows:

  • Embeddings: fq:QueryRdf_q:\text{Query} \to \mathbb{R}^d, fi:ItemRdf_i:\text{Item} \to \mathbb{R}^d
  • Similarity metric: S(u,v)=uvS(u,v) = u^\top v
  • Clustering: Centroids c1c_1 to cKc_K derived from query-tower item representations.
  • Residuals: rI=fi(I)ck(I)r_I = f_i(I) - c_{k(I)}
  • Indexing: ANN structures are built with (cj)(c_j) as coarse centroids and per-item quantized codes encoding (rI)(r_I).

At query time:

  1. A query QQ is embedded via eQ=fq(Q)e_Q = f_q(Q).
  2. The PP closest centroids {cjp}\{c_{j_p}\} to eQe_Q are selected.
  3. Items indexed under these centroids have their residuals decoded (typically via PQ).
  4. Each candidate item II is scored by:

score(I)=eQ(ck(I)+PQ.decode(codeI))2\text{score}(I) = -\|e_Q - (c_{k(I)} + \text{PQ.decode}(\text{code}_I))\|^2

This aligns the initial candidate selection tightly with the learned query geometry, eliminating cross-tower distortions, while the fine-grained step exploits the rich representation of fif_i (Wang et al., 15 Dec 2025).

4. Theoretical Consistency

CI’s retrieval consistency theorem establishes that, under the condition that fqf_q and fif_i are well aligned (fq(I)fi(I)\|f_q(I) - f_i(I)\| is small) and fqf_q's space is isotropic, the ANN search in {fq(I)}\{f_q(I)\} attains the same coarse candidate coverage as the ideal objective argmaxIDS(fq(Q),fi(I))\arg\max_{I \in \mathcal{D}} S(f_q(Q), f_i(I)). The argument proceeds as:

  1. If fq(I)fi(I)f_q(I) \approx f_i(I) for all II, then S(fq(Q),fi(I))S(fq(Q),fq(I))S(f_q(Q), f_i(I)) \approx S(f_q(Q), f_q(I)).
  2. For normalized, isotropic embeddings, maximizing uvu^\top v over vv is equivalent to minimizing uv\|u - v\|.
  3. Clustering and coarse filtering in fqf_q space produces a quantized, yet consistent, approximation of nearest-neighbor objectives.

A corollary is that using fq(I)f_q(I) for clustering while retaining fi(I)f_i(I) for residual quantization preserves semantic consistency and enables finer discrimination between items, leveraging the greater expressiveness of the item tower (Wang et al., 15 Dec 2025).

5. Implementation Workflow and Pseudocode

The end-to-end CI construction and retrieval process can be summarized as:

Step Offline Index Construction Online Retrieval
Input Trained fqf_q, fif_i, corpus D\mathcal{D} Query, fqf_q
Encoding eIqfq(I)e_I^q \gets f_q(I), eIifi(I)e_I^i \gets f_i(I) eQfq(Q)e_Q \gets f_q(Q)
Clustering K-means on {eIq}\{e_I^q\}: {cj}\{c_j\} Select top PP centroids
Assignment/Residual k(I)=argminjeIqcj2k(I) = \arg\min_j \|e_I^q - c_j\|^2, rI=eIick(I)r_I = e_I^i - c_{k(I)} As in index
Quantization/Indexing PQ-encode rIr_I, assign to list k(I)k(I) PQ-decode for candidates
Ranking Build IVF-PQ with centroids + PQ-codes eQ(ck(I)+rI)2-\|e_Q - (c_{k(I)} + r_I)\|^2

No additional loss is introduced during indexing; the method depends on prior alignment of the towers (e.g., via input swapping in the SymmAligner module). The system is compatible with standard IVF-PQ codebooks and does not incur extra online latency (Wang et al., 15 Dec 2025).

6. Empirical Results and Engineering Considerations

In industrial-scale deployments, the CI module operates with cluster counts K=4096K=4096, probe numbers P=64P=64, and PQ code lengths of 64 bytes per item. Storage overhead is unchanged relative to conventional IVF-PQ, as CI reuses cluster centroids in query space and per-item codebooks on item-space residuals. Offline computation requires one additional forward pass of fqf_q per document, an acceptable cost given offline indexing. Online latency remains identical, with query complexity O(Tq+P(D+code-decode))O(T_q + P \cdot (D + \text{code-decode})) where DD is embedding dimension.

The reported empirical enhancements include:

  • MS-MARCO (nprobe=1): Recall@10 improves from 0.2767 to 0.3268 (approx. 18% relative), MRR@100 from 0.1771 to 0.2157.
  • MS-MARCO (nprobe=64): MRR@100 increases from 0.4353 to 0.4480.
  • Production e-commerce (1M items; 10M interactions): Recall@100 rises by 4% relative, NDCG@100 by 9.3% relative after indexing (Wang et al., 15 Dec 2025).

7. Summary and Significance

Consistent Indexing with Dual-Tower Synergy leverages aligned dual-tower embeddings to construct a two-view hierarchical index, where the coarse clustering and candidate selection are performed strictly in query-tower geometry, eliminating cross-tower spatial inconsistencies. The fine stage retains the full representational richness of the item tower, enabling improved recall and ranking without additional inference latency or storage costs. This approach is provably consistent, lightweight for engineering at billion-item scales, and validated by significant empirical improvements across public and industrial datasets (Wang et al., 15 Dec 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Whiteboard

Follow Topic

Get notified by email when new papers are published related to Consistent Indexing with Dual-Tower Synergy Module.