Semantic Beam Search in Dense Retrieval

Updated 11 May 2026

Semantic Beam Search is a retrieval algorithm that extends traditional beam search to dense embedding spaces by maintaining multiple evidence chains for multi-hop reasoning.
It employs a query encoder, a passage encoder, and a composition module to iteratively refine query embeddings at each retrieval hop.
Empirical evaluations on benchmarks like HotpotQA demonstrate significant improvements in recall and QA performance over single-hop and earlier multi-hop approaches.

Semantic Beam Search is a retrieval algorithm designed for multi-step, evidence-chain construction directly in a dense embedding space, as operationalized in the Beam Dense Retrieval (BeamDR) architecture. Generalizing traditional beam search from sequence modeling, Semantic Beam Search adaptively maintains multiple promising partial chains of evidence passages, composing query embeddings at each step to facilitate multi-hop semantic reasoning in large unstructured corpora. It forms the core of the BeamDR model for multi-hop question answering, achieving substantial improvements in recall and downstream QA metrics over both single-hop dense and earlier multi-hop retrieval methods (Zhao et al., 2021).

1. Formal Definition

Semantic Beam Search extends classic beam search to the domain of dense retrieval. In this adaptation, a “beam” of $B$ partial evidence chains is maintained throughout the retrieval process, with each chain represented by a current embedding vector. At each hop, every chain is expanded by retrieving the top- $K$ most semantically similar next passages, determined by inner-product or cosine similarity in the embedding space. These expanded chains are scored and pruned to retain only the top $B$ across the global pool, enabling simultaneous tracking of multiple highly promising multi-step reasoning pathways.

This procedure allows the retrieval system to incrementally compose a query embedding that reflects both the original question and the semantic context accumulated from previously retrieved passages, forming a chain of evidence that supports complex multi-hop reasoning.

2. Architectural Components

The BeamDR architecture implementing Semantic Beam Search consists of three main modules:

Query Encoder $E_Q(\cdot)$ : Maps the input question $Q$ to a $d$ -dimensional dense vector, typically using a dual-encoder structure such as BERT.
Passage Encoder $E_P(\cdot)$ : Independently encodes any corpus passage $p$ into a $d$ -dimensional dense vector, with the same architecture as $E_Q$ but not necessarily sharing weights.
Composition Module $K$ 0: Combines the current chain embedding $K$ 1 with a retrieved passage embedding $K$ 2, realized as a small feed-forward network, element-wise addition, or addition with layer normalization. The composition updates the semantic context, ensuring that the embedding for each chain reflects all accumulated evidence up to the current hop.

This architecture enables explicit, iterative tracking of reasoning chains and adaptive refinement of query representations after each retrieval step.

3. Semantic Beam Search Procedure

The Semantic Beam Search algorithm is executed as follows (notational conventions as in (Zhao et al., 2021)):

Initialization: Compute the initial embedding for the input question, $K$ 3, and initialize the beam with empty chains and zero scores.
Iterative Retrieval (for $K$ 4):
- For each beam entry, compute its current chain embedding $K$ 5.
- For all passages $K$ 6 in the corpus, compute similarities $K$ 7 using inner-product or cosine similarity:
$K$ 8

For each beam entry, select the top $K$ 9 passages with highest similarity.
For each ( $B$ 0, $B$ 1, sim) in the top $B$ 2, update: - Chain: append $B$ 3, - Score: accumulate similarity, - Embedding: $B$ 4, e.g.,

$B$ 5

or

$B$ 6
Aggregate all expanded candidates from all beam entries and globally prune to retain only the top $B$ 7 by total accumulated score.

Completion: After $B$ 8 hops, output top scoring evidence chains, each of length $B$ 9.

The following table summarizes the key variables and workflow parameters:

Symbol	Description	Typical Value
$E_Q(\cdot)$ 0	Beam size (number of parallel chains kept)	3
$E_Q(\cdot)$ 1	Number of retrieval hops	2
$E_Q(\cdot)$ 2	Number of expansions per beam at each hop	(configurable)
$E_Q(\cdot)$ 3	Embedding dimensionality	(model-dependent)

4. Training Objective

The model is trained to maximize the probability of selecting the correct next passage in a gold evidence chain $E_Q(\cdot)$ 4 at each hop. The objective is hop-wise negative log-likelihood with a softmax over candidate similarities:

$E_Q(\cdot)$ 5

where $E_Q(\cdot)$ 6 are the composition network parameters and $E_Q(\cdot)$ 7 the regularization coefficient. This encourages the model to score the gold passage higher at each reasoning step, integrating semantic relevance as new evidence is composed into the chain.

5. Beam Maintenance and Pruning Strategy

At each retrieval hop, each of the $E_Q(\cdot)$ 8 beam entries spawns $E_Q(\cdot)$ 9 children via its top- $Q$ 0 expansions, resulting in $Q$ 1 candidates. Each candidate chain is scored by summing inner-product similarities over its path. Global pruning is then applied: only the top $Q$ 2 chains (in terms of accumulated chain score) are kept for the next hop. After $Q$ 3 hops, surviving chains represent complete reasoning trajectories of length $Q$ 4, ranked by their evidence accumulation score.

A plausible implication is that this beam pruning mechanism concentrates computation and memory on the most promising reasoning paths, enabling scalable, effective multi-hop retrieval.

6. Empirical Impact in Multi-Hop QA

Semantic Beam Search as instantiated in the BeamDR model was evaluated on the HotpotQA dataset (both “distractor” and full Wikipedia settings) (Zhao et al., 2021). With $Q$ 5 and $Q$ 6:

First-hop recall@10: ≈ 92% (vs. 88% for single-hop dense retrieval)
Second-hop recall@10, conditioned on the first: ≈ 80% (vs. 65%)
End-to-end chain recall@10 (full evidence chain correct): ≈ 75% (vs. 55%)
QA metrics (with downstream reader):
- BeamDR+reader: 67.5 F1 (distractor), 60.3 F1 (full Wiki)
- Previous best multi-hop: ~62 F1 (distractor), ~55 F1 (full Wiki)

Across both retrieval and question-answering metrics, Semantic Beam Search yields substantial gains over single-hop dense retrieval and prior sparse/graph-based multi-hop approaches. This suggests that maintaining multiple evidence chains and composing semantic context at each retrieval step is crucial for capturing the latent reasoning required in multi-step tasks.

7. Significance and Implications

Semantic Beam Search represents a principled extension of beam search into dense embedding spaces, directly enabling multi-hop semantic reasoning with unstructured text. Its core contributions are (a) tracking and updating multiple partial evidence chains, (b) enriching chain representations through composition of embeddings at each step, and (c) aggressive global pruning based on cumulative semantic similarity. The approach substantially reduces reliance on semi-structured knowledge, demonstrating strong empirical performance on multi-step retrieval and complex question-answering tasks. These results confirm the importance of iterative, compositional retrieval and parallel hypothesis maintenance for multi-hop reasoning over unstructured text (Zhao et al., 2021).

Markdown Report Issue Upgrade to Chat

References (1)

Multi-Step Reasoning Over Unstructured Text with Beam Dense Retrieval (2021)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Semantic Beam Search.