Query-Aware Retrieval

Updated 19 November 2025

Query-aware retrieval is an information retrieval paradigm that conditions candidate representations on the current query, capturing dynamic user intent.
It employs techniques such as query-specific fusion, graph propagation, and object-aware query perturbation to refine relevance during search.
Empirical results show improvements like +20–26% Recall@50 and enhanced interpretability across modalities including text, image, and video.

Query-aware retrieval is a class of information retrieval algorithms and architectures in which the representation, search, or ranking of candidate items is continuously or explicitly adapted to the current query. This paradigm contrasts with static or context-free retrieval pipelines that employ fixed embeddings or scoring functions. Query-awareness can manifest in model architectures, feature construction, cross-modal fusion, graph-based propagation, and iterative refinement, allowing retrieval systems to incorporate finer user intent, dependence on query structure, or dynamic semantic feedback during search. Recent literature demonstrates substantial improvements in recall, ranking accuracy, and interpretability across text, image, video, scientific document, multimodal, and fairness-aware settings.

1. Foundational Principles of Query-Aware Retrieval

The defining trait of query-aware retrieval systems is the engagement of the query in the process of candidate representation or scoring. Traditional systems map queries and candidates independently into a latent feature space and score by static similarity (cosine, dot-product, L2). Query-aware methods, in contrast, condition representations, scoring, or attention computations on the query or its context.

Key operational principles include:

Early fusion: Modality features are weighted or combined via functions dependent on the query semantics, as in CONQUER's query-dependent fusion for video moment retrieval (Hou et al., 2021).
Contextual augmentation: Retrieval models incorporate query context signals—such as web search snippets or LLM-rewrites—into unified representations using architectures like Fusion-in-Decoder, robustly handling partial or missing context (Mohankumar et al., 19 Jul 2024).
Query-conditioned graph propagation: Document or chunk graphs are constructed with edge weights or attention mechanisms that are functions of the query, enabling recall beyond the initial candidate pool (Rathee et al., 26 Oct 2024, Agrawal et al., 25 Jul 2025).
Query perturbation and projection: Embedding spaces are enriched by components projected onto subspaces linked to detected objects or aspects, amplifying signals relevant to the query (Sogi et al., 17 Jul 2024, Uy et al., 2020).

The precise mathematical construction varies with the application: dense retrieval, moment localization, cross-modal matching, MCQA evidence selection, and multimodal data lake indexing.

2. Architectures and Mathematical Formalizations

Many architectures have emerged to operationalize query-awareness:

Query-specific fusion weights: In CONQUER (Hou et al., 2021), modality fusion weights $(\mu^v, \mu^t)$ for visual and textual features are computed by NetVLAD over the query tokens, ensuring adaptive representation per query.
Graph neural networks with query-conditioned attention: Enhanced GATs (EGAT) inject the query embedding into each attention computation per edge and node (Agrawal et al., 25 Jul 2025):

$e_{ij}^{(k)} = \text{LeakyReLU}\left(\mathbf{a}^{(k)T}[z_i^{(k)} \| z_j^{(k)} \| \tilde{e}_{ij} \| u_q^{(k)}]\right)$

The graph pooling and final scoring are also query-dependent.

Fusion-in-Decoder context enrichment: Multi-source signals (web snippets, LLM profiles) are concatenated and processed jointly in the sequence dimension to produce context-aware hidden states and embedding vectors (Mohankumar et al., 19 Jul 2024).
Object-aware query perturbation: For cross-modal retrieval, queries are decomposed into parallel and orthogonal components with respect to PCA subspaces extracted from detected objects in the candidate image; the parallel components are selectively amplified (Sogi et al., 17 Jul 2024):

$q' = q + \alpha \sum_b w(S_b) (\Phi_b \Phi_b^T) q$

Options-aware embedding composition in MCQA: The query embedding is constructed by concatenating the question with multiple candidate options, and trained to mimic the oracle representation via contrastive loss (Singh et al., 27 Jan 2025).

3. Graph-Based and Adaptive Retrieval Techniques

Graph-based and adaptive retrieval approaches exploit inter-item relations and propagate query-conditioned relevance signals beyond the initial candidate set.

Query affinity modelling: Quam builds a relevance-aware document similarity graph $G_a$ where edge weights $f(d_i, d_j)$ encode co-relevance learned via cross-encoder fine-tuning. Unseen candidates $d$ are scored by expected set affinity propagating relevance estimates along graph edges (Rathee et al., 26 Oct 2024):

$\text{SetAff}(d \mid S, q) = \sum_{d' \in S} P_p(\text{Rel}\mid q, d') \cdot f(d, d')$

This iteration is core to adaptive retrieval, boosting recall under tight ranking budgets.

Knowledge graph augmentation and query expansion: In retrieval-augmented generation, query-aware methods use multi-path KG fusion and attention reward models to select and enrich subgraphs most semantically aligned to the query, with expansions reflecting entity, relation, and context alignment (Wei et al., 7 Jul 2025).
Document-based relation filtering (DRF) for semi-structured queries: By embedding document nodes and scoring neighbor relations by similarity to query embeddings, knowledge-aware retrieval tightly matches relational and textual constraints (Xia et al., 17 Oct 2024).

4. Applications Across Modalities and Retrieval Tasks

The query-aware paradigm extends across numerous IR tasks:

Cross-modal image-text retrieval: Object-aware query perturbation enhances model sensitivity to small or semantically critical objects, correcting for information asymmetry between text and image (Sogi et al., 17 Jul 2024).
Video moment retrieval: Query-dependent fusion and attention tightly couple clips and query semantics for fine-grained temporal localization (Hou et al., 2021, Yang et al., 2020).
Document-to-document scientific retrieval: PRISM decomposes full papers into multi-aspect subqueries (research question, method, experiments), retrieves per aspect, and fuses via reciprocal rank (Park et al., 14 Jul 2025).
Multiple-choice QA: OADR constructs option-dependent query embeddings, improving evidence sentence matching and end-to-end QA accuracy (Singh et al., 27 Jan 2025).
Fairness-aware search: FAIR-QR refines the query iteratively to improve exposure of underrepresented groups, balancing relevance with group fairness (Chen et al., 27 Mar 2025).
Multimodal data lakes: MQRLD's query-aware feature transformation selects and optimizes extrinsic and intrinsic metrics (recall, latency, CBR) via Bayesian optimization over observed query workloads (Sheng et al., 29 Aug 2024).
Remote sensing retrieval: Knowledge-aware expansion fuses structured KG signals with captions to better align short text queries with richly detailed multi-object imagery (Mi et al., 6 May 2024).

5. Experimental Impact and Performance

Empirical results consistently indicate material improvements from query-aware techniques:

Recall and ranking metrics: Quam yields +20%–26% gains in Recall@50 vs. static re-ranking, especially under tight budgets (Rathee et al., 26 Oct 2024). Object-aware query perturbation improves mR@1 for small objects by ~2.7 points (Sogi et al., 17 Jul 2024). KTIR obtains +0.88–1.28 point mean Recall in remote sensing (Mi et al., 6 May 2024). PRISM improves fine-grained document-to-document Recall@K by 4.3%–7% (Park et al., 14 Jul 2025).
Robustness: Context-glancing curricula ensure context-aware retrievers function gracefully under missing or partial query-context (Mohankumar et al., 19 Jul 2024). Graph neural architectures achieve higher recall as query complexity rises, with consistent performance on multi-hop QA tasks (Agrawal et al., 25 Jul 2025).
Efficiency and interpretability: FAIR-QR preserves high relevance and fairness scores with a transparent, iterative refinement history visible to auditors (Chen et al., 27 Mar 2025). MQRLD reduces cross-bucket scan rates to <10% on real multimodal workloads (Sheng et al., 29 Aug 2024).

6. Limitations, Open Challenges, and Future Directions

Limitations of current query-aware retrieval frameworks include:

Dependency on external resources: Knowledge augmentation in KTIR and KAR requires high-coverage, high-quality knowledge graphs, with performance sensitive to entity linking and relation extraction (Mi et al., 6 May 2024, Xia et al., 17 Oct 2024).
Computation and memory overhead: Tree-based fusion, per-query graph construction, and multi-agent inference impact throughput and scaling, though many systems (e.g., CONQUER and MQRLD) optimize for QPS and cluster traversal (Hou et al., 2021, Sheng et al., 29 Aug 2024).
Fine granularity and cold-start: Some query-aware methods depend on precomputed context or require sufficient feedback data for optimization, highlighted by context-glancing and workload-driven transformations (Mohankumar et al., 19 Jul 2024, Sheng et al., 29 Aug 2024).
Complexity of aspect decomposition: For scientific retrieval, multi-aspect agents may not fully disambiguate relevance dimensions if query or candidate documents lack clear segmentation (Park et al., 14 Jul 2025).

Ongoing research trends:

Dynamic, joint learning of context signals and knowledge selection (Mi et al., 6 May 2024, Xia et al., 17 Oct 2024);
End-to-end fusion of graph propagation and multimodal feature representations (Agrawal et al., 25 Jul 2025, Wei et al., 7 Jul 2025);
Real-time adaptivity using continual query logs for index optimization (Sheng et al., 29 Aug 2024);
Extension to multi-hop reasoning and hybrid symbolic–neural models.

7. Comparative Table: Major Query-Aware Retrieval Frameworks

Framework	Query-Awareness Mechanism	Empirical Impact
CONQUER (Hou et al., 2021)	Query-dependent fusion + bi-attention in video	+2–3% R@1 on TVR, DiDeMo
Quam (Rathee et al., 26 Oct 2024)	Query-affinity graph propagation	+20–26% Recall@50; robust at low budget
Object-Aware Q-Perturb (Sogi et al., 17 Jul 2024)	PCA subspace query enhancement	+2.67 points mR@1 small objects
PRISM (Park et al., 14 Jul 2025)	Multi-aspect query agent, chunk retrieval	+4.3–7% Recall@K
KTIR (Mi et al., 6 May 2024)	KG-driven enrichment of text queries	+0.8–1.28 mR improvement
FAIR-QR (Chen et al., 27 Mar 2025)	Iterative query refinement for fairness	Highest nDCG×AWRF on TREC Fair
A²ATS (He et al., 18 Feb 2025)	Query-aware vector quantization for KV cache	2.1–2.7× tokens/s throughput
QMKGF (Wei et al., 7 Jul 2025)	Query-aware KG fusion with attention reward	+7.9–9.7 ROUGE-1 multi-hop QA

These systems collectively demonstrate the ongoing shift from static, one-size-fits-all retrieval pipelines to dynamic, intent-sensitive, and contextually enriched retrieval paradigms. The query-aware retrieval paradigm thus represents a central, rapidly evolving direction in IR, with broad applicability across modalities, domains, and fairness concerns.