Papers
Topics
Authors
Recent
Search
2000 character limit reached

Semantic Beam Search in Dense Retrieval

Updated 11 May 2026
  • Semantic Beam Search is a retrieval algorithm that extends traditional beam search to dense embedding spaces by maintaining multiple evidence chains for multi-hop reasoning.
  • It employs a query encoder, a passage encoder, and a composition module to iteratively refine query embeddings at each retrieval hop.
  • Empirical evaluations on benchmarks like HotpotQA demonstrate significant improvements in recall and QA performance over single-hop and earlier multi-hop approaches.

Semantic Beam Search is a retrieval algorithm designed for multi-step, evidence-chain construction directly in a dense embedding space, as operationalized in the Beam Dense Retrieval (BeamDR) architecture. Generalizing traditional beam search from sequence modeling, Semantic Beam Search adaptively maintains multiple promising partial chains of evidence passages, composing query embeddings at each step to facilitate multi-hop semantic reasoning in large unstructured corpora. It forms the core of the BeamDR model for multi-hop question answering, achieving substantial improvements in recall and downstream QA metrics over both single-hop dense and earlier multi-hop retrieval methods (Zhao et al., 2021).

1. Formal Definition

Semantic Beam Search extends classic beam search to the domain of dense retrieval. In this adaptation, a “beam” of BB partial evidence chains is maintained throughout the retrieval process, with each chain represented by a current embedding vector. At each hop, every chain is expanded by retrieving the top-KK most semantically similar next passages, determined by inner-product or cosine similarity in the embedding space. These expanded chains are scored and pruned to retain only the top BB across the global pool, enabling simultaneous tracking of multiple highly promising multi-step reasoning pathways.

This procedure allows the retrieval system to incrementally compose a query embedding that reflects both the original question and the semantic context accumulated from previously retrieved passages, forming a chain of evidence that supports complex multi-hop reasoning.

2. Architectural Components

The BeamDR architecture implementing Semantic Beam Search consists of three main modules:

  • Query Encoder EQ()E_Q(\cdot): Maps the input question QQ to a dd-dimensional dense vector, typically using a dual-encoder structure such as BERT.
  • Passage Encoder EP()E_P(\cdot): Independently encodes any corpus passage pp into a dd-dimensional dense vector, with the same architecture as EQE_Q but not necessarily sharing weights.
  • Composition Module KK0: Combines the current chain embedding KK1 with a retrieved passage embedding KK2, realized as a small feed-forward network, element-wise addition, or addition with layer normalization. The composition updates the semantic context, ensuring that the embedding for each chain reflects all accumulated evidence up to the current hop.

This architecture enables explicit, iterative tracking of reasoning chains and adaptive refinement of query representations after each retrieval step.

3. Semantic Beam Search Procedure

The Semantic Beam Search algorithm is executed as follows (notational conventions as in (Zhao et al., 2021)):

  1. Initialization: Compute the initial embedding for the input question, KK3, and initialize the beam with empty chains and zero scores.
  2. Iterative Retrieval (for KK4):
    • For each beam entry, compute its current chain embedding KK5.
    • For all passages KK6 in the corpus, compute similarities KK7 using inner-product or cosine similarity:

    KK8

  • For each beam entry, select the top KK9 passages with highest similarity.
  • For each (BB0, BB1, sim) in the top BB2, update: - Chain: append BB3, - Score: accumulate similarity, - Embedding: BB4, e.g.,

    BB5

    or

    BB6

  • Aggregate all expanded candidates from all beam entries and globally prune to retain only the top BB7 by total accumulated score.

  1. Completion: After BB8 hops, output top scoring evidence chains, each of length BB9.

The following table summarizes the key variables and workflow parameters:

Symbol Description Typical Value
EQ()E_Q(\cdot)0 Beam size (number of parallel chains kept) 3
EQ()E_Q(\cdot)1 Number of retrieval hops 2
EQ()E_Q(\cdot)2 Number of expansions per beam at each hop (configurable)
EQ()E_Q(\cdot)3 Embedding dimensionality (model-dependent)

4. Training Objective

The model is trained to maximize the probability of selecting the correct next passage in a gold evidence chain EQ()E_Q(\cdot)4 at each hop. The objective is hop-wise negative log-likelihood with a softmax over candidate similarities:

EQ()E_Q(\cdot)5

where EQ()E_Q(\cdot)6 are the composition network parameters and EQ()E_Q(\cdot)7 the regularization coefficient. This encourages the model to score the gold passage higher at each reasoning step, integrating semantic relevance as new evidence is composed into the chain.

5. Beam Maintenance and Pruning Strategy

At each retrieval hop, each of the EQ()E_Q(\cdot)8 beam entries spawns EQ()E_Q(\cdot)9 children via its top-QQ0 expansions, resulting in QQ1 candidates. Each candidate chain is scored by summing inner-product similarities over its path. Global pruning is then applied: only the top QQ2 chains (in terms of accumulated chain score) are kept for the next hop. After QQ3 hops, surviving chains represent complete reasoning trajectories of length QQ4, ranked by their evidence accumulation score.

A plausible implication is that this beam pruning mechanism concentrates computation and memory on the most promising reasoning paths, enabling scalable, effective multi-hop retrieval.

6. Empirical Impact in Multi-Hop QA

Semantic Beam Search as instantiated in the BeamDR model was evaluated on the HotpotQA dataset (both “distractor” and full Wikipedia settings) (Zhao et al., 2021). With QQ5 and QQ6:

  • First-hop recall@10: ≈ 92% (vs. 88% for single-hop dense retrieval)

  • Second-hop recall@10, conditioned on the first: ≈ 80% (vs. 65%)

  • End-to-end chain recall@10 (full evidence chain correct): ≈ 75% (vs. 55%)

  • QA metrics (with downstream reader):

    • BeamDR+reader: 67.5 F1 (distractor), 60.3 F1 (full Wiki)
    • Previous best multi-hop: ~62 F1 (distractor), ~55 F1 (full Wiki)

Across both retrieval and question-answering metrics, Semantic Beam Search yields substantial gains over single-hop dense retrieval and prior sparse/graph-based multi-hop approaches. This suggests that maintaining multiple evidence chains and composing semantic context at each retrieval step is crucial for capturing the latent reasoning required in multi-step tasks.

7. Significance and Implications

Semantic Beam Search represents a principled extension of beam search into dense embedding spaces, directly enabling multi-hop semantic reasoning with unstructured text. Its core contributions are (a) tracking and updating multiple partial evidence chains, (b) enriching chain representations through composition of embeddings at each step, and (c) aggressive global pruning based on cumulative semantic similarity. The approach substantially reduces reliance on semi-structured knowledge, demonstrating strong empirical performance on multi-step retrieval and complex question-answering tasks. These results confirm the importance of iterative, compositional retrieval and parallel hypothesis maintenance for multi-hop reasoning over unstructured text (Zhao et al., 2021).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

Whiteboard

Follow Topic

Get notified by email when new papers are published related to Semantic Beam Search.