ThinkQE: Dynamic Query Expansion

Updated 6 December 2025

ThinkQE is a query expansion framework that employs multi-round chain-of-thought reasoning to generate diverse, multifaceted expanded queries.
It integrates iterative corpus feedback and redundancy filtering to capture varied interpretations and prevent semantic collapse in web search.
Empirical evaluations show ThinkQE achieving state-of-the-art zero-shot retrieval performance, rivaling supervised dense retrievers.

ThinkQE is a test-time query expansion (QE) framework developed for web search scenarios that demand both deep semantic exploration and result diversity. Unlike prior LLM-based QE methods that tend to produce narrow expansions by leveraging high-confidence completions, ThinkQE introduces a multi-round, chain-of-thought-guided query reformulation paired with iterative corpus feedback. The framework employs explicit reasoning by LLMs and tightly integrates document retrieval at each expansion step to generate richer, multifaceted expanded queries. ThinkQE achieves state-of-the-art zero-shot retrieval performance across multiple web benchmarks, rivaling or surpassing supervised dense retriever approaches without requiring additional training (Lei et al., 10 Jun 2025).

1. Motivation and Challenges in Query Expansion

Short, ambiguous user queries in web search often admit multiple divergent interpretations or facets. For example, the query "apple purchase" may refer to buying electronic devices, stock trading, or fruit acquisition. Traditional query expansion techniques—such as global term co-occurrence and pseudo-relevance feedback—tend to reinforce dominant semantic modes, resulting in expansions that are semantically narrow and insensitive to less probable but relevant interpretations. Even modern LLM-based expansions (e.g., HyDE, Query2Doc, LameR) tend to favor focused expansions due to the reliance on single-step, high-confidence completions.

ThinkQE directly addresses this by modeling an explicit "thinking process" for expansion, implementing chain-of-thought (CoT) prompting in the LLM, and incorporating iterative retrieval feedback to expand the query in multiple directions and prevent semantic collapse toward a single interpretation (Lei et al., 10 Jun 2025).

2. The ThinkQE Framework: Formal Specification

The ThinkQE workflow comprises multiple rounds, each consisting of retrieval, explicit reasoning, expansion, and query update. A notation summary and key routines are as follows:

Notation:

$q_0$ : Original query
$C$ : Document corpus
$\mathrm{BM25}(q, C)$ : Lexical retrieval function
$D_t \subset C$ : Top-K documents at expansion round $t$
$e_t$ : Expansion segment at round $t$
$q_t$ : Current expanded query at start of round $t$
$B_t$ : Set of documents "blacklisted" (seen) up to round $t$

2.1 Initial Retrieval

Start with a standard retrieval on the initial query:

$D_0 = \mathrm{TopK}(\mathrm{BM25}(q_0, C))$

2.2 Thinking-Based Expansion

At each round $t$ , given $q_{t-1}$ and $D_{t-1}$ , prompt the LLM (specifically, R1-distilled QWEN-14B) to generate:

A chain-of-thought trace $T_t$
An expansion segment $e_t$

$[T_t,\,e_t] = \mathrm{LLM}_{\mathrm{think}}(q_0, D_{t-1})$

Update the expanded query as:

$q_t = q_{t-1} \oplus e_t$

( $\oplus$ : concatenation)

To maintain original intent and prevent query drift, append $q_0$ multiple times at the end, with the repetition factor

$n = \frac{\mathrm{len}(\text{all expansions})}{\mathrm{len}(q_0) \times \lambda}, \quad \lambda=3$

and construct the final query $q_T^\star$ .

2.3 Corpus Interaction Strategy

Rather than static, parallel expansion, ThinkQE interleaves retrieval and expansion:

For each round $t$ :

Retrieval:

$R_t = \mathrm{BM25}(q_{t-1}, C)$

Redundancy Filtering: Exclude all previously seen documents:

$D_t^\mathrm{new} = \mathrm{TopK}(R_t \setminus (B_{t-1} \cup D_{t-1}))$

Update blacklist: $B_t = B_{t-1} \cup D_{t-1}$

Expansion and Query Update: As above.

Empirically, $T=3$ rounds and $K=5$ documents per round optimize retrieval efficacy and computational cost.

3. Architecture and Implementation

LLM Backbone: QWEN-R1-Distill-14B, fine-tuned to produce chain-of-thought traces.
Expansion Candidate Diversity: For each round, temperature sampling (0.7) generates two distinct expansions, promoting result variety.
Prompting: Prompts present the original query, enumerate the retrieved documents, and instruct the LLM to "write a correct answering passage" emphasizing knowledge beyond the retrieved snippets.
Document Truncation: Documents limited to 128 tokens (DL19/DL20) or 512 tokens (BRIGHT) for computational tractability.
Retrieval Engine: BM25 implementation via Pyserini.
Final Query Construction: After $T$ expansion rounds, repeated originals are appended according to the specified repetition strategy.

4. Empirical Evaluation and Results

4.1 Datasets and Metrics

TREC Deep Learning 2019/2020 (DL19/DL20): Large-scale web search over MS MARCO documents. Metrics: mean average precision (mAP), nDCG@10, Recall@1k.
BRIGHT: StackExchange-based corpus, emphasizing multi-faceted queries. Metric: nDCG@10.

4.2 Performance Comparison

Method	DL19 mAP	DL19 nDCG@10	BRIGHT nDCG@10
BM25	30.1	50.6	17.0
HyDE (14B)	41.8	61.3	-
LameR (14B)	42.8	64.9	29.3
ThinkQE (14B)	45.9	68.8	34.9
Rank-K-32B*	-	-	37.9

*Rank-K-32B is a supervised R1-distilled reranker; ThinkQE is zero-shot and training-free.

ThinkQE consistently surpasses prior zero-shot QE systems (HyDE, LameR) by 3–4 nDCG@10 and approaches or, in some domains, exceeds supervised dense retrievers (e.g., DPR, ANCE).

5. Component Ablations and Analysis

5.1 Effect of Reasoning Trace Generation

Explicit chain-of-thought modeling in the LLM accounts for an nDCG@10 increase of +2.7 on BRIGHT compared to identical architectures without reasoning. This demonstrates that explicit reasoning during expansion leads to richer semantic coverage.

5.2 Impact of Corpus Interaction

Iterative, feedback-based expansion (as opposed to generating all expansions in a single batch) further improves nDCG@10 by 2.4 points under fixed compute budgets, confirming the utility of interleaving retrieval and expansion rather than static scaling.

5.3 Accumulation and Redundancy Filtering

Accumulation (progressive augmentation of the expanded query) and redundancy filtering (exclusion of already-seen documents) are complementary: using both maximizes performance (nDCG@10 = 34.9); omitting either mechanism degrades effectiveness.

6. Insights, Limitations, and Future Prospects

Insights

Explicit chain-of-thought expansion, operationalized via R1-distilled LLMs, captures multiple latent interpretations, promoting semantic exploration.
Corpus-dependent evolution (dynamic interaction between retrieval and expansion) produces more diversified and nuanced finals queries than static approaches, and can approach supervised reranking performance in zero-shot, training-free settings.

Limitations

ThinkQE entails increased inference time and API cost due to multiple LLM calls per query.
Experiments and validation are limited to English; transfer to multilingual and cross-lingual scenarios remains untested.

Future Directions

Adaptive early stopping and selection of informative expansion rounds to further reduce computational overhead.
Integration of parametric or dense retrieval scoring in the feedback loop.
Extension to multilingual, cross-lingual, or domain-adaptive retrieval tasks.
Learning to prioritize expansion terms or segments that provide maximal incremental retrieval benefit.

References

For detailed architectural, experimental, and theoretical background, see "ThinkQE: Query Expansion via an Evolving Thinking Process" (Lei et al., 10 Jun 2025).

Markdown Upgrade to Chat

References (1)

ThinkQE: Query Expansion via an Evolving Thinking Process (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to ThinkQE.