Calibrating entropy-triggered retrieval and amortizing multi-query costs in RAG
Determine how to calibrate token-level entropy thresholds for uncertainty-triggered retrieval in Retrieval-Augmented Generation—specifically in approaches such as FLARE and RIND+QFS—across different application domains, and develop strategies to amortize the computational and latency costs of issuing multiple reformulated queries under tight end-to-end latency budgets.
References
Open questions include how to calibrate entropy thresholds across domains and how to amortise multi-query costs under tight latency budgets.
— A Systematic Literature Review of Retrieval-Augmented Generation: Techniques, Metrics, and Challenges
(2508.06401 - Brown et al., 8 Aug 2025) in Section “What are the innovative methods and approaches compared to the standard retrieval augmented generation?”, Subsubsection “Prompting and Query Strategies—Query reformulation, expansion, and selective triggering of queries”