Papers

Topics

Authors

Recent

View all

Detailed Answer

Quick Answer

Concise responses based on abstracts only

Detailed Answer

Well-researched responses based on abstracts and relevant paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses

Gemini 2.5 Flash

Gemini 2.5 Flash 89 tok/s

Gemini 2.5 Pro 49 tok/s Pro

GPT-5 Medium 29 tok/s Pro

GPT-5 High 31 tok/s Pro

GPT-4o 98 tok/s Pro

GPT OSS 120B 424 tok/s Pro

Kimi K2 164 tok/s Pro

2000 character limit reached

More Documents, Same Length: Isolating the Challenge of Multiple Documents in RAG (2503.04388v1)

Published 6 Mar 2025 in cs.CL

Abstract: Retrieval-augmented generation (RAG) provides LLMs with relevant documents. Although previous studies noted that retrieving many documents can degrade performance, they did not isolate how the quantity of documents affects performance while controlling for context length. We evaluate various LLMs on custom datasets derived from a multi-hop QA task. We keep the context length and position of relevant information constant while varying the number of documents, and find that increasing the document count in RAG settings poses significant challenges for LLMs. Additionally, our results indicate that processing multiple documents is a separate challenge from handling long contexts. We also make the datasets and code available: https://github.com/shaharl6000/MoreDocsSameLen .

Collections

Summary

An Analysis of Retrieval-Augmented Generation with Multiple Documents

The paper "More Documents, Same Length: Isolating the Challenge of Multiple Documents in RAG" investigates the performance intricacies of retrieval-augmented generation (RAG) systems when handling an increased number of documents while maintaining a fixed context length. This paper is pertinent in the field of multi-hop question answering (QA), where LLMs are employed to synthesize information across multiple datasets.

Methodology

The authors approach this by testing various state-of-the-art LLMs on datasets customized from the MuSiQue multi-hop QA benchmark. By preserving the context length and the position of pertinent information while varying the number of documents retrieved, the paper isolates the challenge posed by multiple related documents available simultaneously to the model. The document sets included a mixture of essential documents containing information necessary to answer the question alongside distractors that are irrelevant to the query.

Key Findings

The research highlights a significant degradation in model performance with increasing document numbers. Notably, adding more documents to RAG contexts can decrease accuracy by up to 10%. This is demonstrated through experimental observations across multiple LLM architectures, such as Llama-3.1, Qwen-2, and Gemma-2. While most models showed a decrease in performance as more documents were included, Qwen-2 maintained stable results, suggesting a possible advantage in handling multiple document setups.

Practical and Theoretical Implications

From a practical standpoint, these findings indicate the necessity for RAG systems to be cautious about the number of documents they integrate. The research visible in the analysis suggests a potential trade-off in retrieval strategies—whether to optimize for a broader yet shallower document retrieval or a narrower, deeper selection of documents. The paper suggests that current strategies of simply retrieving all potentially relevant documents could lead to confusion and overlap, complicating the LLMs' ability to parse and synthesize the necessary information effectively.

On a theoretical level, these observations provide insights into the distinct challenges posed by multi-document environments compared to those posed by long contexts. The processing of multiple documents involves additional layers of complexity, such as handling redundant or overlapping information and resolving inter-document dependencies. This points to the need for advanced mechanisms in LLMs that significantly enhance their consolidation and synthesis capabilities without being misled by irrelevant or conflicting information.

Future Directions

The paper opens several avenues for further research. Novel methodologies for processing multiple documents, distinguishing between critical and non-essential information, and handling potential conflicts within retrieved datasets could be developed. Additionally, expanding data evaluation methodologies across diverse and chaotic real-world datasets could contribute to more robust RAG systems.

Conclusion

By isolating the impact of multiple documents in RAG configurations, the authors provide a compelling insight into the unique challenges this scenario presents. This research presents foundational knowledge that underpins future developments in RAG systems and signifies the necessity for innovative solutions in effectively utilizing multi-document retrievals in large-scale LLMs. The availability of the datasets and code further facilitates ongoing exploration and model refinement by researchers within the field.

Paper Prompts

Explore 10 Community Prompts

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Generate Now

Authors (5)

GitHub

GitHub - shaharl6000/MoreDocsSameLen

Tweets

https://twitter.com/ShaharLevy19/status/1899469059923571037

https://twitter.com/_reachsumit/status/1897858938726760511

https://twitter.com/GptMaestro/status/1901127375196438591