LLM-Based Query Expansion Methods

Updated 10 September 2025

LLM-based query expansion is a method leveraging large language models to enrich queries with semantically relevant terms, passages, or embeddings.
These techniques include explicit strategies like pseudo-document generation and implicit methods that refine query representations using corpus feedback.
Applications demonstrate improved IR metrics through enhanced semantic matching, domain adaptation, and context-aware query refinements.

LLM-based query expansion refers to a class of methods that employ pre-trained or instruction-tuned generative LLMs to synthesize, reweight, or refine queries—either by producing additional terms, pseudo-documents, or vector-space alterations—to improve information retrieval (IR) performance, especially in the presence of ambiguous, underspecified, or otherwise challenging user queries. LLM-driven query expansion draws on the models' knowledge, their ability to generalize, and contextualization capabilities, often augmenting or replacing traditional approaches based on lexical co-occurrence, thesauri, or statistical feedback.

1. Foundational Concepts and Taxonomy

LLM-based query expansion builds on the classic motivation of mitigating vocabulary mismatch between queries and documents. The defining feature is the injection of generative or representational capabilities from recent LLMs, which enables semantic, context-aware, and knowledge-enriched expansion strategies.

A comprehensive taxonomy (Li et al., 9 Sep 2025) distinguishes between:

Explicit QE: The model generates concrete expansion text (terms, passages, pseudo-documents) that is concatenated with the original query—seen in methods like Query2doc (Wang et al., 2023), HyDE, or QA-Expand (Seo et al., 12 Feb 2025).
Implicit QE: The model refines the query representation directly (e.g., by adjusting the embedding in dual-encoder space or via PRF-induced vector mixtures), without producing extra concatenated text, as in ANCE-PRF or SoftQE (Pimpalkhute et al., 20 Feb 2024).

Table: Major Taxonomic Dimensions

Dimension	Examples	Description
Point of Injection	Explicit, Implicit	External expansion vs. feature-space update
Grounding	Zero-shot, Corpus-steered, KG	LLM prior, corpus feedback, structured data
Learning Alignment	SFT, DPO, Distillation	Fine-tuning LLMs for retrieval effectiveness
KG Integration	KAR (Xia et al., 17 Oct 2024), BMQExpander	Including structured, relational knowledge

These axes often combine in practice—for example, a corpus-steered explicit expansion with additional SFT/DPO alignment and KG-based constraints for domain adaptation and hallucination mitigation.

2. Prominent Methodological Strategies

Pseudo-Document and Passage Generation

A dominant line involves directly prompting LLMs (decoder-only or encoder-decoder) to generate an expanded passage—a "pseudo-document" or a list of related terms:

Query2doc (Wang et al., 2023): A prompt (few-shot or zero-shot) is used to generate a passage d′ from the query q, forming an expanded query:

$q^+ = \mathit{concat}(\{q\} \times n, d')$

with $n$ repetitions of $q$ , $d'$ the pseudo-document.

Chain-of-Thought (CoT) Prompting (Jagerman et al., 2023): CoT-style prompts ("explain/expand step-by-step") elicit rationale-rich expansions containing diverse, contextually relevant keywords and explanations, further boosting recall.

Multi-Faceted or Disentangled Expansion

More recent frameworks generate multiple expansions, each targeting a distinct facet of the underlying information need:

QA-Expand (Seo et al., 12 Feb 2025): Produces multiple sub-questions $q_i$ (generation module $G_Q$ ), then synthesizes corresponding pseudo-answers $a_i$ . A feedback model $G_S$ rewrites/filters these, and aggregations (e.g., reciprocal rank fusion) ensure diverse, robust expansions:

$S = G_S(\{ (q_i, a_i) \}_{i=1}^m, Q)$

$q^* = 0.7\cdot \mathrm{emb}(Q) + 0.3\cdot (1/|S|)\sum \mathrm{emb}(a_i')$

Corpus-Grounded and Iterative Expansion

Limiting hallucination and improving factuality is achieved by combining LLM outputs with corpus-derived signals:

Corpus-Steered Query Expansion (CSQE) (Lei et al., 28 Feb 2024): Uses initial retrieval (e.g., BM25) to extract corpus-sourced key sentences $S_i$ , combines them with LLM-generated expansions for a final expanded query:

$q_\mathrm{expanded} = q + \sum_{i=1}^n S_i + \sum_{j=1}^m H_j$

Iterative/Cognitive Expansion (e.g., ThinkQE (Lei et al., 10 Jun 2025)): Introduces a "thinking" phase over retrieved documents followed by expansion; updates expansions in rounds with corpus feedback to ensure both coverage and result diversity.

Knowledge-Guided and Aligned Expansion

Structured knowledge and fine-tuning alignment are increasingly critical:

KG-based Expansion (Xia et al., 17 Oct 2024): Entities are extracted from $q$ ; document-based KG neighbors are chosen via

$S_{j,q} = \mathrm{Sim}(\mathbf{x}_j, \mathbf{X}_q)$

and only top- $k$ neighbors are forwarded for LLM-based expansion, ensuring both semantic and relational correctness.

Aligned Query Expansion (AQE) (Yang et al., 15 Jul 2025): LLMs are trained (via RSFT or DPO) to produce expansions $e$ that empirically maximize retrieval effectiveness (measured by the ranking of known relevant documents), removing the need for costly generate-then-filter schemes.

3. Model Architectures and Training Paradigms

LLM-based QE draws on various model families (Li et al., 9 Sep 2025):

Encoder-only (e.g., BERT): Ideal for contextual embedding, token-level refinement, and PRF via representational updates.
Encoder-decoder (e.g., T5): Supports explicit generation and controlled expansion with flexibility for instruction tuning.
Decoder-only (e.g., GPT-3/4, LLaMA): Facilitates zero-/few-shot open-ended expansion, multi-step reasoning, or “hypothetical document” synthesis.
Instruction-tuned: Enhanced alignment and controllability for prompting or task-specific generation.
Domain/Multilingual: (BioBERT, SciBERT, multilingual variants)—critical for technical or cross-lingual expansions.

Training regimes have evolved towards task-aligned fine-tuning:

SFT/DPO/alignment: Models are updated such that generated expansions directly improve retrieval metrics, e.g., via contrastive learning, distillation (SoftQE (Pimpalkhute et al., 20 Feb 2024)), or preference optimization (DPO).
Distillation: Offline LLM-generated expansions are integrated into standard query encoders, allowing latency reduction at inference (as in SoftQE).
Joint Optimization: Frameworks like ExpandR (Yao et al., 24 Feb 2025) optimize both the LLM and retriever simultaneously, aligning generation with ranking preference signals.

4. Performance, Challenges, and Evaluation

LLM-based QE frequently yields substantial improvements in standard IR metrics (MAP, nDCG@10, Recall@k):

BM25 gains: Query2doc (Wang et al., 2023) reports 3–15% improvement; CSQE shows robust increases across TREC DL and BEIR.
Dense retriever improvements: Jointly optimized or alignment-based expansions (ExpandR, AQE) achieve 5–8% improvement versus non-expanded baselines.
Domain-specific: Ontology-guided approaches (BMQExpander (Nazi et al., 15 Aug 2025)) achieve up to 22.1% $NDCG@10$ improvement over BM25 in biomedical IR, showing enhanced robustness to distribution shifts and query perturbations.

Challenges:

Hallucination and Drift: LLM generations can invent plausible but spurious entities; grounding with corpus or knowledge graphs mitigates, but does not eliminate, this risk (Lei et al., 28 Feb 2024, Nazi et al., 15 Aug 2025).
Knowledge Leakage: Performance on benchmarks may be inflated if LLMs regurgitate memorized evidence (knowledge leakage), limiting generalizability to unseen domains or post-training data (Yoon et al., 19 Apr 2025).
Ambiguity and Unknowns: If LLM knowledge is insufficient, expansion may degrade retrieval by yielding irrelevant or misleading content; ambiguity can yield biased refinements (Abe et al., 19 May 2025).
Efficiency and Cost: Long pseudo-document generations or multiple sampling/re-ranking increases inference cost and latency; newer methods like CTQE (Kim et al., 2 Sep 2025) upcycle candidate tokens from a single decoding pass for high efficiency.

5. Comparative Analyses and Real-World Deployment Considerations

Comparing LLM-based expansion to traditional QE (Li et al., 9 Sep 2025):

Contextualization and Adaptability: LLM-based methods excel in contextual and semantic matching, supporting nuanced or conversational queries; traditional methods are more robust in efficiency and in resource-constrained domains.
Domain Adaptation: Domain-specific models (BioBERT, SciBERT) or ontology-guided frameworks (BMQExpander) bridge vocabulary gaps and improve robustness—supporting e.g., biomedical, legal, or e-commerce search.
System Integration: LLM-generated expansions enable new paradigms in retrieval-augmented generation (RAG), multi-query fusion (e.g., Exp4Fuse (Liu et al., 5 Jun 2025)), and profile-aware system evaluation (demographically-inspired QE (Alaofi et al., 25 Aug 2025)).

Relevant real-world implementations often involve hybrid strategies—combining LLM-generated expansions, explicit corpus feedback, preference alignment, and external knowledge integration—to balance precision, recall, computational cost, and domain robustness.

6. Open Problems and Research Directions

The survey (Li et al., 9 Sep 2025) identifies several active areas for further research:

Quality and Safety Controls: Systematic filtering, grounding, and alignment to minimize hallucination and topic drift.
Adaptive Invocation: Dynamic selection/invocation of QE pipelines based on pre-scoring knowledge sufficiency, ambiguity, and risk of degradation (Abe et al., 19 May 2025).
Cost-Aware Design: Techniques such as candidate token upcycling (CTQE), compressed representation alignment (SoftQE), and efficient fusion ranking (Exp4Fuse) to support deployment in latency- and resource-sensitive environments.
Evaluation Beyond nDCG: Incorporation of fairness/privacy, demographic diversity in benchmarking (demographically-inspired QE), and task-oriented metrics for RAG and cross-lingual retrieval.
Hybrid and Knowledge-Constrained Models: Deeper synthesis of KG signals, document-based relation filtering, and multi-step (iterative “thinking”) expansion to further close the semantic gap without sacrificing controllability.

In sum, LLM-based query expansion constitutes a broad, evolving family of techniques that leverage large, generalizable LLMs for semantically and contextually richer IR. Through the interplay of generative modeling, task alignment, corpus/knowledge grounding, and explicit evaluation of limitations, these methods are redefining the design landscape for robust and adaptable retrieval in the era of large-scale language modeling.