Papers
Topics
Authors
Recent
Search
2000 character limit reached

Siamese Sharing Dual Encoder (SDE)

Updated 26 March 2026
  • Siamese Sharing (SDE) is a dual encoder architecture featuring complete weight sharing across both towers to create a unified semantic space.
  • It employs a shared Transformer encoder and projection layer to achieve symmetric query and context representations, enhancing retrieval performance.
  • Empirical evaluations on QA and IR tasks demonstrate that SDE outperforms asymmetric models with improved precision and recall metrics.

Siamese Sharing (SDE), formally known as the Siamese Dual Encoder architecture, is a dual-tower neural retrieval model characterized by strict parameter sharing across both encoder towers. SDE is prominently used in question-answering (QA) and information retrieval (IR) systems, where it has demonstrated superior performance relative to dual-encoder variants with asymmetric or partially shared parameters. The SDE applies the same token embedding layer, Transformer encoder stack, and projection layer weights to both the query (e.g., a question) and the context (e.g., a candidate answer or passage), enforcing a unified embedding space optimal for tasks requiring semantic alignment between distinct modalities (Dong et al., 2022).

1. Architecture and Parameter Sharing

The SDE model consists of two identical towers, each processing distinct inputs but sharing all model parameters:

  • Encoder Backbone: Both towers use a Transformer encoder (T5 1.1 in small, base, or large configurations). The hidden-state dimension DhD_h varies with model size: 512 (small), 768 (base), or 1024 (large).
  • Pooling and Projection: Final hidden states from the Transformer are averaged, producing representations hq,haRDhh_q, h_a \in \mathbb{R}^{D_h}. These are projected to the retrieval embedding space of dimension DeD_e (typically De=DhD_e = D_h) via a shared linear layer: vx=Wprojhx+bprojv_x = W_{\mathrm{proj}} h_x + b_{\mathrm{proj}}, with WprojRDe×Dh,bprojRDeW_{\mathrm{proj}} \in \mathbb{R}^{D_e \times D_h}, b_{\mathrm{proj}} \in \mathbb{R}^{D_e}.
  • Parameter Sharing: All components are shared:
    • Token embeddings
    • Transformer weights
    • Projection layer (Wproj,bprojW_{\mathrm{proj}}, b_{\mathrm{proj}})

This strict sharing enforces that both question and answer encoders generate aligned representations.

2. Formal Mathematical Description

Let qq denote a query and aa a candidate context. The towers compute:

  • Encoding:

hq=Encoder(q)RDhh_q = \mathrm{Encoder}(q) \in \mathbb{R}^{D_h}

ha=Encoder(a)RDhh_a = \mathrm{Encoder}(a) \in \mathbb{R}^{D_h}

  • Projection:

vq=Wprojhq+bprojRDev_q = W_{\mathrm{proj}} h_q + b_{\mathrm{proj}} \in \mathbb{R}^{D_e}

va=Wprojha+bprojRDev_a = W_{\mathrm{proj}} h_a + b_{\mathrm{proj}} \in \mathbb{R}^{D_e}

  • Resulting scoring towers:

f(q)=vq,g(a)=vaf(q) = v_q, \quad g(a) = v_a

The towers use fully shared parameters, guaranteeing representation symmetry and optimal use of inductive biases within the embedding space (Dong et al., 2022).

3. Similarity Scoring and Objective Function

The SDE employs similarity functions for ranking:

  • Dot product:

s(q,a)=vqvas(q, a) = v_q^\top v_a

  • Cosine similarity:

s(q,a)=vqvavqvas(q, a) = \frac{v_q^\top v_a}{\|v_q\| \|v_a\|}

Training utilizes in-batch negatives with a softmax-contrastive loss:

L=i=1Nlogexp(sim(qi,ai+)/τ)j=1Nexp(sim(qi,aj)/τ)\mathcal{L} = -\sum_{i=1}^N \log \frac{ \exp(\mathrm{sim}(q_i, a^+_i) / \tau) }{ \sum_{j=1}^N \exp(\mathrm{sim}(q_i, a_j) / \tau) }

where τ\tau is a temperature hyperparameter (empirically, τ1\tau \approx 1).

4. Empirical Evaluation and Comparative Results

Across QA and IR tasks (MS MARCO, open-domain Natural Questions (NQ), MultiReQA), the SDE consistently outperforms Asymmetric Dual Encoder (ADE) baselines and most parameter-sharing ablations when equivalent training protocol and model size are held constant.

Summary of Key Results (T5-base, Dh=De=768D_h = D_e = 768):

Task Metric SDE ADE ADE-SPL
MS MARCO (dev) P@1 15.92% 14.20% 15.46%
MRR@10 28.49% 26.31% 28.20%
Open-domain NQ Top-5 acc. 62.2% 57.6% 62.7%
Top-20 acc. 77.0% 73.2% 76.4%
Top-100 acc. 84.6% 82.7% 84.4%
MultiReQA (SQuAD) P@1 / MRR 70.13/78.44 60.39/70.33 69.39/77.65
  • Here, P@1 denotes Precision@1; MRR denotes Mean Reciprocal Rank.
  • ADE-SPL (ADE+Shared Projection Layer) approaches SDE’s performance across most tasks.
  • Sharing only the token embedder or freezing it (ADE-STE, ADE-FTE) yields small incremental improvements over ADE.

5. Analysis of Embedding Spaces

t-SNE visualization of embedding spaces provides additional evidence:

  • Without shared projection (ADE, ADE-STE, ADE-FTE), question and answer embeddings form two nearly disjoint clusters in 2D t-SNE space. This suggests a significant semantic misalignment between towers, reducing retrieval quality.
  • With shared projection (SDE, ADE-SPL), question and answer embeddings overlap and intermingle in the t-SNE space, reflecting a common semantic space and directly corresponding to improved retrieval metrics (Dong et al., 2022).

6. Strengths, Limitations, and Practical Recommendations

Strengths of SDE:

  • Parameter sharing tightly enforces a joint semantic embedding space.
  • Consistently superior empirical retrieval performance is observed across tasks and model sizes.

Limitations:

  • Cross-tower interactions are limited to scoring (dot/cosine); the architecture does not model richer cross-input attention or late interaction.
  • The analysis is restricted to dual-encoder models; hybrid or late-interaction methods are not addressed.

Practical Guidance:

  • SDE, with total weight sharing, is the empirically preferred design for homogeneous dual-encoder QA and IR settings.
  • If strict symmetry is impractical (e.g., for highly dissimilar input modalities), sharing at least the projection layer (ADE-SPL) can recover most of the performance gains of full SDE at minimal implementation cost (Dong et al., 2022).
Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Siamese Sharing (SDE).