Formal theory for the RAG retrieval trade-off
Establish a formal and comprehensive theoretical explanation for the observed trade-off between the number of relevant documents and the number of totally irrelevant (random) documents included in the context of Retrieval-Augmented Generation prompts for open-domain question answering, clarifying why accuracy improves when a minimal set of retrieved documents is supplemented with random documents and why performance degrades when many semantically related but non-answer documents are included.
References
While establishing a formal or comprehensive theory behind these findings remains an open research challenge, we can still infer that there seems to be a trade-off between the number of relevant and totally irrelevant documents.
— The Power of Noise: Redefining Retrieval for RAG Systems
(2401.14887 - Cuconasu et al., 26 Jan 2024) in Results, Subsection "Retriever Trade-Off"