Query Fusion Techniques
- Query Fusion is a family of techniques that aggregate multiple query results or variants to yield more effective, robust, and semantically aligned outputs across various applications.
- It encompasses methods from rank-based schemes like CombSUM and RRF to neural architectures and adaptive fusion in multi-modal and LLM-driven environments.
- Modern implementations enhance performance metrics in IR, computer vision, and recommender systems by addressing noisy inputs, modality gaps, and scalable fusion weight learning.
Query fusion is a family of techniques aimed at aggregating information from multiple queries, query variants, or modalities to produce results that are more effective, robust, or semantically aligned with target tasks. Its application spans information retrieval, recommender systems, computer vision, multi-modal detection, few-shot learning, LLMs, and knowledge-augmented reasoning. The methodological landscape of query fusion includes early rank-based fusion schemes, probabilistic learning approaches, query-attentive neural fusion layers, adaptive multi-stage reranking, and cross-modal query propagation.
1. Classical Query Fusion in Information Retrieval
Query fusion originated in IR as an extension of rank fusion, where multiple retrieval results for a common topic, produced by rerunning the system with different queries or systems, are combined into a single output ranking. Early algorithms focused primarily on unsupervised rank aggregation operations:
- CombSUM: Scores documents by summing retrieval scores across all query sources.
- CombMNZ: Multiplies CombSUM by the count of sources where a document appears, boosting consensus hits.
- Reciprocal Rank Fusion (RRF): Aggregates rankings by adding reciprocal ranks across sources: .
ProbFuse, a probabilistic data fusion algorithm, leverages training data to estimate positional relevance probabilities for each system via segmented “rank buckets.” At test time, each document's score is a weighted sum of probabilities from the segments it appears in, explicitly favoring early ranks and rewarding consensus ("Chorus," "Skimming," and "Dark-Horse" effects). ProbFuse consistently outperforms CombMNZ, delivering relative MAP improvements of 20–50% on TREC benchmarks with minimal overhead (Lillis et al., 2014).
Offline centroid-based fusion strategies allow enhanced throughput, where query variations for broad topics are grouped into clusters, their results fused, and centroids cached for efficient online boosting via quick reranking or interleaving with user queries. These approaches guarantee negligible latency overhead (<3 ms/query) and deliver NDCG@10 improvements of 23–43% over standard IR baselines (Benham et al., 2018).
Recent IR query fusion integrates synthetic query variants produced by instruction-tuned LLMs, applying RRF or CombSUM over the resulting result sets. On TREC newswire tasks, as few as 10 synthetic variants fused per topic improve nDCG@10 and MAP by up to 40–50% versus a single query, outperforming pseudo relevance feedback (Breuer, 2024).
2. Neural and Multi-Modal Query Fusion
Query fusion has been extended to neural architectures for multi-modal and multi-instance perception:
- Hierarchical Query Fusion Decoders: In transformer-based 3D instance segmentation, hierarchical fusion mechanisms recover queries (“object proposals”) that are unstable or suffer from mask collapse—by explicitly detecting low IoU overlap across decoder layers and reinserting poor queries for further refinement. This strategy maintains recall and AP across decoder depth, mitigating query disappearance and collapse (Lu et al., 6 Feb 2025).
- Modality-Decoupled RGB-Thermal Detection: For robust RGB-Thermal object detection, modality decoupling is implemented with DETR-style detectors, where high-confidence queries from either modality are adapted via cross-modal MLPs and selected through top-k gating. Fused queries minimize the impact of sensor noise while maintaining independence. This permits optimization with unpaired modality data and preserves the branch autonomy under missing modalities (Tian et al., 13 Jan 2026).
- Radar–Camera Fusion with Query-Based Transformer Architectures: In outdoor 3D detection, query-based fusion frameworks sample adaptive instance-centric features from both BEV and image views. Object queries are distributed in polar coordinates with density adaptive to spatial context, and are updated via dual-view deformable attention and radar-guided depth heads, resulting in state-of-the-art performance and resilience under sensor failure (Chu et al., 2024).
- Query-Based Temporal-Spatial LiDAR Fusion: Cooperative multi-agent 3D detection employs query-based fusion pipelines where object queries extracted from sparse BEV features are temporally aligned using query position and timestamp, then fused across agents via cross-attention. This framework supports asynchronous sensor ticking and delivers accurate, time-predictive object locations (Yuan et al., 2024).
3. Query Fusion in Retrieval-Augmented Generation and Knowledge Graphs
- Query-Aware Graph Fusion in RAG: Recent methods build knowledge graphs from unstructured corpora using prompt-driven LLM entity/relation extraction, then construct query-centered, multi-path subgraphs (one-hop, multi-hop, PageRank important nodes). A query-aware attention model scores subgraphs for semantic relevance, allowing fusion of high-scoring triples and expansion of the original query. This produces a semantically enriched chunk pool, enhancing LLM factuality and ROUGE scores by 7–10 percentage points over standard rerankers in multi-hop QA tasks (Wei et al., 7 Jul 2025).
4. Query Fusion with LLM Adaptation and Query Expansion
- Query-Adaptive LoRA Fusion: In multi-domain adapter fusion for LLMs, query-adaptive, data-free methods (qa-FLoRA) compute per-layer fusion weights for several LoRA modules using KL-divergence between adapter-augmented and base-model output distributions. Fusion weights reflect per-query domain relevance and can be computed layer-wise at inference with no additional training, outperforming static and centroid-similarity baselines by 5–10% across math, code, and medical tasks (Shukla et al., 12 Dec 2025).
- Zero-Shot Query Expansion and Fusion for Sparse Retrieval: LLM-generated passage expansions (zero-shot QE) are merged with original queries using adaptive RRF, leveraging both original results and generated hypotheses. Two retrieval routes are run in parallel, and fusion weights are tuned to boost consensus-retrieved documents. Ablation reveals strong gains only when both original and expanded query result sets are fused; adding more routes offers limited further benefit due to the mixture of relevant and noisy expansions (Liu et al., 5 Jun 2025).
5. Query Fusion in Graph Neural Networks and Personalized Recommendation
- Query-Attentive Fusion Layer: In personalized paper recommendation (AMinerGNN), a heterogeneous graph embedding approach projects users, papers, and keywords into a unified feature space. A distance-covariance driven attention dynamically weights the fusion between the user embedding and the keyword, capturing personalization, search, or interdisciplinary scenarios. The resulting fused query is used in standard CTR prediction, yielding 5–30% improvements in AUC and precision (Huai et al., 2022).
6. Query Fusion in Few-Shot Learning and Medical Image Segmentation
- Support-Query Prototype Fusion: Few-shot segmentation frameworks (SQPFNet) construct prototypes from both support and query images. A coarse segmentation mask from the support-derived prototype guides the construction of a query-specific prototype based on query image patterns, which is then fused with the support prototype via weighted sum. The fused prototype yields improved mask precision and reduced intra-class variation, leading to consistent state-of-the-art gains on multiple medical image datasets (Wu et al., 2024).
7. Semantic Query Fusion for Zero-Shot Composed Image Retrieval
- Semantic Query-Augmented Fusion (SQAF): For Training-Free Zero-Shot Composed Image Retrieval, SQAF enhances the CLIP-derived query embedding with MLLM-generated high-level target captions. Fused embeddings blend visual grounding, modifier semantics, and MLLM-induced semantic proxies, which improves intent capture and retrieval accuracy. Optimal fusion weights (β) and the number of caption variants are empirically established; excessive caption diversity can dilute the signal (Wu et al., 30 Sep 2025).
The field of query fusion exhibits strong empirical and theoretical advantages across domains, including clear effectiveness gains, computational tractability, robustness to noise and incompleteness, and applicability to distributed, multi-modal, and multi-query scenarios. Remaining open questions concern dynamic fusion weight learning, optimal variant selection, scalability of LLM-driven graph extraction, and domain transfer of semantic fusion. For all practical deployments, the most reliable gains are achieved by combining a moderate number of high-quality query sources or variants using robust ranking-based fusion with judicious attention to per-query relevance and modality context.