Dice Question Streamline Icon: https://streamlinehq.com

Origin of Errors in RAG: Context Utilization vs. Context Sufficiency

Determine whether errors in Retrieval-Augmented Generation (RAG) systems arise because large language models fail to utilize the retrieved context or because the retrieved context is insufficient to answer the query.

Information Square Streamline Icon: https://streamlinehq.com

Background

The paper studies Retrieval-Augmented Generation (RAG) systems, where LLMs receive external context at inference time to improve factuality and performance on open-domain question answering. A central uncertainty motivating the work is whether observed errors stem from the model’s inability to leverage the provided context or from the retrieval failing to supply information that is sufficient to answer the query.

To investigate this, the authors introduce the notion of “sufficient context” and develop an autorater to label question–context pairs as sufficient or insufficient without requiring ground-truth answers. This framing enables stratified analyses of model behavior, but the fundamental question of whether errors primarily reflect failures of context utilization versus context insufficiency is explicitly identified as open.

References

Despite much research on Retrieval Augmented Generation (RAG) systems, an open question is whether errors arise because LLMs fail to utilize the context from retrieval or the context itself is insufficient to answer the query.

Sufficient Context: A New Lens on Retrieval Augmented Generation Systems (2411.06037 - Joren et al., 9 Nov 2024) in Abstract