Decide-Then-Retrieve (DTR) Framework

Updated 8 January 2026

Decide-Then-Retrieve (DTR) is a paradigm that separates the decision to retrieve external knowledge from the actual retrieval process.
It employs decision mechanisms like intent classifiers and uncertainty measurement to trigger retrieval only when necessary, minimizing redundant calls.
Empirical results show that DTR reduces latency and computational cost while maintaining or improving task accuracy across various applications.

Decide-Then-Retrieve (DTR) is a paradigm in information retrieval and retrieval-augmented generation (RAG) that introduces an explicit separation between the decision to retrieve external knowledge and the retrieval operation itself. Rather than indiscriminately invoking retrieval for every query or generation step, DTR frameworks first estimate whether parametric model knowledge is sufficient to satisfy the information need, and only invoke retrieval if the decision mechanism determines external grounding is required. DTR architectures have been realized for open-domain and multi-hop question answering, conversational QA, FAQ search in product search engines, knowledge graph QA, video moment retrieval, and LLM-based collaborative search. Recent empirical evidence demonstrates DTR can reduce retrieval calls, latency, and token consumption while maintaining or even improving task accuracy, response relevance, and user satisfaction (Chen et al., 2023, Chen et al., 7 Jan 2026, Song et al., 23 Oct 2025, Roy et al., 2024, Liu et al., 15 Aug 2025, Dhole, 16 Jan 2025, Tan et al., 2024).

1. Key Principles and Motivation

The central principle of DTR is to adaptively gate retrieval using either learned classifiers, uncertainty estimates, proxy models, or planning modules. Standard RAG approaches unconditionally invoke retrieval, which can introduce irrelevant or even misleading evidence, increase computational cost, and degrade generation quality, particularly for queries easily answerable from parametric knowledge alone or in the presence of ambiguous, sparse input (Chen et al., 7 Jan 2026, Dhole, 16 Jan 2025). DTR explicitly seeks to minimize spurious retrievals and focus external evidence gathering only where benefit is likely.

Major motivations include:

Efficiency: DTR pipelines dramatically reduce unnecessary retrieval calls, lowering compute and latency. For example, gating FAQ retrieval using query intent reduced BERT rerank latency by 95%, with only 5% of traffic requiring expensive reranking (Chen et al., 2023).
Quality: In conversational and multi-hop QA, DTR curtails the introduction of irrelevant context, thus improving groundedness and answer faithfulness (Roy et al., 2024, Song et al., 23 Oct 2025).
Scalability: By decoupling the decision, generation, and retrieval stages, DTR systems enable extensible architecture and modular optimization (Chen et al., 2023, Tan et al., 2024).

2. Decision Mechanisms in DTR Frameworks

DTR decision modules take several forms depending on application:

Intent Classifiers: In FAQ retrieval systems, a fine-tuned RoBERTa-large classifier predicts whether a query expresses question intent, using cross-entropy loss. Queries with $P(\text{intent} = \text{question} | \text{query}) \geq \tau$ trigger retrieval, others are handled as standard product search (Chen et al., 2023).
Uncertainty-Guided Triggering (UGT): Computational frameworks for open-domain QA use generation uncertainty (normalized negative log-likelihood) as a confidence metric. If $u > \tau$ for uncertainty $u$ , retrieval is triggered; otherwise, parametric LLM output is accepted (Chen et al., 7 Jan 2026, Dhole, 16 Jan 2025). Alternative uncertainty measures include degree matrix variance (Jaccard, NLI), spectral “eccentricity” of output clusters, and semantic sets cardinality (Dhole, 16 Jan 2025).
Proxy Model Heuristics: SlimPLM employs a distilled LLM to generate “heuristic answers” and then a lightweight judgment model to decide if knowledge is present or retrieval is necessary (Tan et al., 2024).
Planning Modules: For complex multi-hop QA over knowledge graphs, Graph-RFT explicitly decomposes queries into subquestions using a Cartesian-inspired planning module and uses RL to schedule graph vs. web retrieval (Song et al., 23 Oct 2025).
Conversational Decision Heads: SELF-multi-RAG interleaves a classification head that outputs [Retrieve, No Retrieve, Continue] based on multi-turn context, supervised using “multi-turn critic” labels for human-like retrieval timing (Roy et al., 2024).
Multimodal Denoising: In video retrieval, DRNet masks irrelevant video clips via text-conditioned cross-attention and state space blocks before retrieval, acting as an explicit denoising decision point (Liu et al., 15 Aug 2025).

3. Retrieval and Evidence Selection Strategies

When retrieval is triggered, DTR frameworks offer targeted evidence selection:

Query Reformulation: Text generation models (BART, T5, SlimPLM QR) rewrite sparse or keyword queries into well-formed natural questions for improved retrieval (Chen et al., 2023, Tan et al., 2024, Roy et al., 2024).
Dual-Path Retrieval (DPR): DTR can simultaneously issue retrieval requests using the original query and a pseudo-context generated by the LLM, retrieving and uniting evidence from both signals. Adaptive scoring fuses cosine similarities using geometrically motivated joint angle minimization (Chen et al., 7 Jan 2026).
Selective Filtering: SELF-multi-RAG judges passage relevance via a trained softmax head, discarding low-scoring retrievals for final answer construction; weighting includes faithfulness and utility self-reflections (Roy et al., 2024).
State Space Denoising: DRNet applies structured state space blocks and dynamic convolutions for multimodal clip filtering in VMR. Only features that pass semantic consistency checks are used for conditional retrieval (Liu et al., 15 Aug 2025).
Subclaim Decomposition: SlimPLM rewrites proxy answers into atomic claim queries, invokes retrieval only for missing knowledge fragments, and composes the final answer from concatenated retrieved documents and parametric context (Tan et al., 2024).

4. Architecture and Optimization

DTR architectures exhibit strongly modular design:

Pipeline Decoupling: Decision, query generation, and retrieval occur in strictly separated modules, each tuned independently (e.g., RoBERTa intent, T5/BART reformulation, BERT/SBERT retrieval in product FAQ search) (Chen et al., 2023).
End-to-End Optimization: Some frameworks use reinforcement learning (Graph-RFT) with multi-reward signals, including answer accuracy, retrieval sufficiency, coverage, and explicit penalties to optimize retrieval scheduling (Song et al., 23 Oct 2025). SELF-multi-RAG supervises multi-head outputs via cross-entropy over decision, relevance, grounding, and utility (Roy et al., 2024).
Proxy Collaboration: SlimPLM interposes a lightweight LLM for knowledge gap detection, orchestrating targeted retrieval and reducing big-model inference frequency (Tan et al., 2024).

5. Empirical Evaluation and Quantitative Performance

Extensive experimental results validate DTR effectiveness:

FAQ Retrieval:
- Hit@1 improved by +0.13 (absolute) when using T5-reformulation vs. raw BM25 queries.
- BERT-Rerank + T5 pipeline latency reduced to 0.046× baseline by avoiding ~95% of retrieval calls (Chen et al., 2023).
Open-Domain QA:
- On five QA benchmarks, DTR (UGT+DPR) consistently improved EM and F1 vs. standard RAG. For instance, Qwen2.5-72B raised EM/F1 from 38.83/50.73 (RAG) to 40.46/52.14 (DTR), with fewer retrieval calls (Chen et al., 7 Jan 2026).
Long-Form QA (Uncertainty Detection):
- Eccentricity-based DTR halved retrieval calls (ret-ratio ~0.64), with only marginal F1 reduction (0.605 → 0.561) (Dhole, 16 Jan 2025).
Conversational QA:
- SELF-multi-RAG improved recall@5 by +15% using summary-based rewriting, and human-rated answer quality by ~13% on three datasets (Roy et al., 2024).
Knowledge Graph QA:
- Graph-RFT improved CWQ (4-hop) accuracy to 67.2% vs. PoG’s 62.6%, with adaptive retrieval scheduling according to KG completeness (Song et al., 23 Oct 2025).
Video Moment Retrieval:
- DRNet achieved Recall@[email protected]=66.73 and mAP improvements >2 points over SOTA on QVHighlights; ablating denoising dropped mAP by 11.7% (Liu et al., 15 Aug 2025).
SlimPLM Proxy-Based QA:
- Highest EM (30.73) on ASQA and NQ, best ROUGE-1 (29.97) on ELI5, with exactly one big-model call per question (Tan et al., 2024).

6. Domain-Specific Variants and Extensions

DTR has been adapted for diverse modalities and domains:

Product Search: Intent-aware FAQ retrieval, with specialized classifiers and reformulators tuned for e-commerce queries (Chen et al., 2023).
Knowledge Reasoning: Cartesian-style planning for multi-hop QA over incomplete knowledge graphs (Song et al., 23 Oct 2025).
Retrieval-Augmented Generation for QA: Uncertainty-guided triggering and dual-path retrieval for efficient factual answering (Chen et al., 7 Jan 2026, Dhole, 16 Jan 2025).
Conversational Systems: Multi-turn context decision heads, summarization-driven rewrites, and passage self-reflection (Roy et al., 2024).
Video Retrieval: Explicit denoising and multimodal purification prior to semantic retrieval (Liu et al., 15 Aug 2025).
Collaborative LLM-Proxy Systems: Knowledge gap detection and claim-level targeted evidence assembly (Tan et al., 2024).

7. Implementation and Deployment Considerations

Several practical insights arise from DTR deployments:

Latency and Throughput: Gating retrieval yields substantial latency reduction (e.g., sub-50 ms SLA at 50 QPS for product FAQ DTR) (Chen et al., 2023).
Cost Efficiency: Proxy-based DTR attains state-of-the-art accuracy with lower inference and token consumption (Tan et al., 2024).
Modularity: Decoupled components facilitate rapid iteration, robust ablations, and scalable UI design (e.g., unified FAQ display atop product grids) (Chen et al., 2023).
Ablation Robustness: Excluding decision or denoising modules consistently results in performance degradation across domains, confirming their necessity (Chen et al., 2023, Liu et al., 15 Aug 2025, Tan et al., 2024).

Summary

Decide-Then-Retrieve represents a substantive methodological advance for retrieval-enhanced generation, spanning textual, conversational, multimodal, and graph-centric information-seeking tasks. By explicitly factoring the retrieval decision, DTR frameworks maximize efficiency and answer quality, adapt retrieval to knowledge gaps, and support robust reasoning even under incomplete or sparse input. Quantitative evaluations uniformly show DTR outperforms or matches baselines for accuracy while sharply reducing retrieval operations and system latency (Chen et al., 2023, Chen et al., 7 Jan 2026, Song et al., 23 Oct 2025, Roy et al., 2024, Liu et al., 15 Aug 2025, Dhole, 16 Jan 2025, Tan et al., 2024).