Multi-Query Generation Techniques

Updated 10 September 2025

Multi-query generation is a paradigm that systematically decomposes complex information needs into diverse, optimized queries to capture multiple facets of user intent.
It leverages techniques like submodular optimization, beam search, and mixture-of-experts to improve efficiency in SQL analytics, conversational search, and retrieval-augmented generation.
Applications span database systems, knowledge graphs, and LLM-driven conversational AI, resulting in enhanced query coverage, reduced latency, and improved system performance.

Multi-query generation is a fundamental paradigm across information retrieval, database optimization, conversational search, knowledge-augmented generation, and multi-modal inference. It refers to systematically generating, optimizing, and executing sets of queries—often formulated to cover multiple facets, user intents, or computational sub-goals—such that resource sharing, expressiveness, and retrieval coverage are maximized. This synthesis surveys the principles, algorithmic frameworks, and system architectures developed for multi-query generation in large-scale analytics, information retrieval, natural language to SQL, graph streaming, bundle generation, and retrieval-augmented LLMs, with a focus on technical innovation and performance implications.

1. Motivation and Core Definitions

Multi-query generation arises from the need to:

Efficiently process batches of structurally related queries or queries sharing common computation, as in warehouse and distributed systems (Kathuria et al., 2015, Rafay, 2016)
Disambiguate complex or ambiguous user intents into aspect-specific queries for information retrieval and conversational assistants (Abbasiantaeb et al., 28 Mar 2024, Lupart et al., 22 Nov 2024, Kostric et al., 27 Jun 2024)
Enhance database and API query translation from natural language, accommodating domain or dialect variation and complex logical intent (Kelkar et al., 2020, Lin et al., 24 Oct 2024, Borchmann et al., 31 Mar 2025)
Augment LLM-based retrieval and reasoning by decomposing subproblems, improving grounding, and mitigating hallucinations (Tan et al., 30 Jun 2025, Wei et al., 7 Jul 2025, Lin et al., 24 Oct 2024)
Exploit repeated subexpressions and data-sharing opportunities to minimize computational overhead (Kathuria et al., 2015, Rafay, 2016, Mayer et al., 2018)

A multi-query system either decomposes a complex information need into a set of diverse queries, generates multiple rewrites or expansions of the same query, or computes a global plan for a batch of submissions.

2. Algorithmic Strategies and System Designs

2.1 Submodular and Greedy Optimization

In SQL and batch analytics, multi-query optimization is formalized as a submodular maximization problem (Kathuria et al., 2015). Queries are analyzed to extract common subexpressions, and the system seeks a set S of nodes (intermediate results) to materialize, maximizing the benefit function mb(S) = bc(∅) − bc(S). The MarginalGreedy algorithm incrementally selects subexpressions with marginal benefit/cost ratios above a threshold, ensuring approximation guarantees. Efficiency improvements, such as pruning and lazy evaluation, are integrated into optimizers like Volcano/Cascades frameworks.

2.2 Join Plan Generation and Parallelization

In distributed SQL-on-Hadoop systems, batching queries (with coverage constraints) allows plan construction such that common joins and selections are computed once (Rafay, 2016). Cost-based dynamic programming considers satisfiability and cost estimates—using cardinality estimation formulas such as $T(R \bowtie S) = \frac{T(R) \times T(S)}{\max(V(R,Y), V(S,Y))}$ —to assemble shared join graphs executed across parallel worker nodes.

2.3 Beam Search and Query Rewrite Aggregation

Neural sequence rewriters for conversational search (e.g., T5-based models) leverage beam search to cheaply generate multiple de-contextualized rewrites per utterance (Kostric et al., 27 Jun 2024). These candidate rewrites are integrated:

In sparse retrieval, via weighted bag-of-words queries with term weights derived from beam scores.
In dense retrieval, by computing a weighted centroid vector, allowing efficient one-shot embedding lookup. Empirically, this strategy surpasses single-rewrite approaches in retrieval accuracy without additional latency.

2.4 Multi-expert and Multi-dialect Approaches

Multi-dialect query generation for database systems introduces Mixture-of-Experts (MoE) architectures (Lin et al., 24 Oct 2024). LoRA-based expert modules are designated per dialect; a dialect router assigns segments of the input to dialect-specific or shared experts, coordinated by (smoothed) routing and load balancing objectives to handle both high-resource and under-represented syntaxes.

2.5 Multi-aspect and Parallel Query Generation via LLMs

Conversational information seeking is addressed by decomposing user utterances into multiple aspect-focused queries, using LLMs to first generate a comprehensive answer and then derive per-aspect queries (AQD) (Abbasiantaeb et al., 28 Mar 2024). These queries are each used for evidence retrieval, followed by aggregation or secondary reranking. For retrieval-augmented generation, multi-query parallelism reduces reasoning latency: during QA, the model generates several search queries in one step, with the retrieval module returning JSON-mapped results (Tan et al., 30 Jun 2025).

2.6 Knowledge Graph Fusion and Entity-driven Expansion

Retrieval-augmented LLMs benefit from query expansion over multi-path knowledge graphs (Wei et al., 7 Jul 2025). LLM-extracted entity-relational triples are organized into multiple subgraphs (one-hop, multi-hop, and personalized PageRank-based), ranked and fused by a query-aware attention reward model that scores each triple’s semantic relevance. High-scoring triples are then used to expand the initial query vector, improving both retrieval coverage and downstream generation.

3. Effectiveness, Performance, and Evaluation

In multi-query optimization for SQL, greedy submodular maximization achieves approximation factors close to the theoretical optimum, and integration into existing optimizers incurs practical overheads with substantial reductions in execution cost (Kathuria et al., 2015).
Shared join plan generation in systems like GLADE demonstrates reduced data scans, improved response times, and adaptive batch execution for SQL-on-Hadoop workloads (Rafay, 2016).
Beam-based multi-query rewriting methods (CMQR) yield MRR improvements of 1–6 percentage points in sparse settings and up to 4.5 in dense retrieval (Kostric et al., 27 Jun 2024).
LLM-driven multi-aspect query generation combined with learned sparse retrieval and cross-encoder reranking in MQ4CS increases both recall and final ranking precision, outperforming human rewrite performance in personalized conversational search scenarios (Lupart et al., 22 Nov 2024).
MoMQ demonstrates 3–5 percentage point gains in execution accuracy for multi-dialect text-to-SQL, with greater robustness in resource-imbalanced settings (Lin et al., 24 Oct 2024).
Multi-query RAG methods (RAG-R1) deliver up to 13% improvement in factual QA accuracy along with more than 11% inference time reduction due to parallel query issuance (Tan et al., 30 Jun 2025).
Knowledge graph-based query expansion (QMKGF) surpasses state-of-the-art rerankers by 9.72% ROUGE gain on HotpotQA (Wei et al., 7 Jul 2025).

4. Applications Across Modalities and Domains

SQL/Database Systems: Cost-efficient multi-query batch processing, cross-dialect query translation, and personalized ad hoc reporting (Kathuria et al., 2015, Rafay, 2016, Kelkar et al., 2020, Lin et al., 24 Oct 2024).
Conversational Search and QA: Multi-faceted query rewriting, aspect-specific passage retrieval and reranking, and dynamic reasoning in LLM-augmented dialog (Abbasiantaeb et al., 28 Mar 2024, Kostric et al., 27 Jun 2024, Lupart et al., 22 Nov 2024, Tan et al., 30 Jun 2025).
Graph Analytics: Selection/join optimization, partition-aware multi-query locality in graph streams, and continuous subscription evaluation (Mayer et al., 2018, Zervakis et al., 2019).
Retrieval-Augmented Generation: Knowledge graph-informed expansion tailored to query semantics, reducing hallucination and supporting evidence-grounded generative tasks (Wei et al., 7 Jul 2025).
Personalized Recommendation and Bundle Generation: Explicit decomposition of user queries into fine-grained goals, deep Q-learning-based combinatorial generation for diverse and complementary item sets (Zhu et al., 2023).

5. Challenges and Future Directions

Several limitations and challenges are documented:

Format Robustness: Not all LLMs reliably handle multi-query instruction (array-of-JSON) outputs, with most open-source LLMs lagging behind commercial models like GPT-4 in structured response rates (Laskar et al., 29 Feb 2024).
Coverage and Redundancy: Ensuring that generated multi-queries collectively capture all critical aspects without redundancy or omission remains an open issue (Kostric et al., 27 Jun 2024, Seo et al., 12 Feb 2025).
Scaling and Latency: Parallel multi-query generation must be balanced with computational and memory constraints in dense retrieval or knowledge fusion settings (Tan et al., 30 Jun 2025, Wei et al., 7 Jul 2025).
Personalization: Effective adaptation of personalization signals (e.g., PTKB) into every stage of multi-query generation and reranking continues to be refined (Lupart et al., 22 Nov 2024, Zhu et al., 2023).
Semantic Representation: Fusing diverse knowledge paths (one-hop, multi-hop, PageRank) for robust semantic expansion necessitates advanced filtering to control semantic drift (Wei et al., 7 Jul 2025).
Resource Imbalance and Generalization: MoE-based large models must propagate knowledge from high- to low-resource dialects while avoiding interference and catastrophic forgetting (Lin et al., 24 Oct 2024).

Emerging directions encompass deeper integration of reward-driven RL for multi-query inference, further coupling of graph expansion with LLM-based prompting, adaptive determination of the optimal number of queries to generate per context, and extension to broader code-synthesis and multi-modal tasks.

6. Summary Table: Principal Multi-Query Generation Frameworks

Paper/Framework	Core Technique	Primary Application Area
MarginalGreedy (Kathuria et al., 2015)	Submodular Greedy Optimization	SQL MQO, batch analytics
GLADE MQO (Rafay, 2016)	Cost-based Shared Join Plan Generation	SQL-on-Hadoop systems
TriC (Zervakis et al., 2019)	Trie-based Subgraph Indexing	Continuous graph stream MQP
MSQG (Cho et al., 2019)	Multi-source Aggregation in Seq2Seq	Common question generation
CMQR (Kostric et al., 27 Jun 2024)	Beam Search Multi-Query Rewriting	Conversational passage retrieval
MoMQ (Lin et al., 24 Oct 2024)	MoE with Dialect-specific Expert Routing	Multi-dialect SQL/text-to-query
MQ4CS (Lupart et al., 22 Nov 2024, Abbasiantaeb et al., 28 Mar 2024)	LLM aspect decomposition + reranking	Conversational search and retrieval
RAG-R1 (Tan et al., 30 Jun 2025)	RL-enhanced Multi-query Parallelism	Retrieval-Augmented QA with LLMs
QMKGF (Wei et al., 7 Jul 2025)	Multi-Path KG Expansion, Attention Fusion	Retrieval-augmented query expansion

7. Conclusion

Multi-query generation, in its various algorithmic and representational forms, constitutes a critical foundation for the next generation of efficient, expressive, and robust information systems. Across domains ranging from declarative analytics and database systems to retrieval-augmented conversational AI and recommendation, recent advances exploit greedy maximization, mixture-of-expert modularity, deep reinforcement learning, knowledge graph fusion, and LLM-guided decomposition to push the frontier of tractable, adaptive, and comprehensive query handling. Ongoing challenges include format robustness, multi-dialect generalization, and the semantics-pragmatics interface in multi-aspect expansion. As such, multi-query generation is poised to remain a central and rapidly evolving research direction in large-scale data and knowledge management.