Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 89 tok/s
Gemini 2.5 Pro 49 tok/s Pro
GPT-5 Medium 29 tok/s Pro
GPT-5 High 31 tok/s Pro
GPT-4o 98 tok/s Pro
GPT OSS 120B 424 tok/s Pro
Kimi K2 164 tok/s Pro
2000 character limit reached

More Documents, Same Length: Isolating the Challenge of Multiple Documents in RAG (2503.04388v1)

Published 6 Mar 2025 in cs.CL

Abstract: Retrieval-augmented generation (RAG) provides LLMs with relevant documents. Although previous studies noted that retrieving many documents can degrade performance, they did not isolate how the quantity of documents affects performance while controlling for context length. We evaluate various LLMs on custom datasets derived from a multi-hop QA task. We keep the context length and position of relevant information constant while varying the number of documents, and find that increasing the document count in RAG settings poses significant challenges for LLMs. Additionally, our results indicate that processing multiple documents is a separate challenge from handling long contexts. We also make the datasets and code available: https://github.com/shaharl6000/MoreDocsSameLen .

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

An Analysis of Retrieval-Augmented Generation with Multiple Documents

The paper "More Documents, Same Length: Isolating the Challenge of Multiple Documents in RAG" investigates the performance intricacies of retrieval-augmented generation (RAG) systems when handling an increased number of documents while maintaining a fixed context length. This paper is pertinent in the field of multi-hop question answering (QA), where LLMs are employed to synthesize information across multiple datasets.

Methodology

The authors approach this by testing various state-of-the-art LLMs on datasets customized from the MuSiQue multi-hop QA benchmark. By preserving the context length and the position of pertinent information while varying the number of documents retrieved, the paper isolates the challenge posed by multiple related documents available simultaneously to the model. The document sets included a mixture of essential documents containing information necessary to answer the question alongside distractors that are irrelevant to the query.

Key Findings

The research highlights a significant degradation in model performance with increasing document numbers. Notably, adding more documents to RAG contexts can decrease accuracy by up to 10%. This is demonstrated through experimental observations across multiple LLM architectures, such as Llama-3.1, Qwen-2, and Gemma-2. While most models showed a decrease in performance as more documents were included, Qwen-2 maintained stable results, suggesting a possible advantage in handling multiple document setups.

Practical and Theoretical Implications

From a practical standpoint, these findings indicate the necessity for RAG systems to be cautious about the number of documents they integrate. The research visible in the analysis suggests a potential trade-off in retrieval strategies—whether to optimize for a broader yet shallower document retrieval or a narrower, deeper selection of documents. The paper suggests that current strategies of simply retrieving all potentially relevant documents could lead to confusion and overlap, complicating the LLMs' ability to parse and synthesize the necessary information effectively.

On a theoretical level, these observations provide insights into the distinct challenges posed by multi-document environments compared to those posed by long contexts. The processing of multiple documents involves additional layers of complexity, such as handling redundant or overlapping information and resolving inter-document dependencies. This points to the need for advanced mechanisms in LLMs that significantly enhance their consolidation and synthesis capabilities without being misled by irrelevant or conflicting information.

Future Directions

The paper opens several avenues for further research. Novel methodologies for processing multiple documents, distinguishing between critical and non-essential information, and handling potential conflicts within retrieved datasets could be developed. Additionally, expanding data evaluation methodologies across diverse and chaotic real-world datasets could contribute to more robust RAG systems.

Conclusion

By isolating the impact of multiple documents in RAG configurations, the authors provide a compelling insight into the unique challenges this scenario presents. This research presents foundational knowledge that underpins future developments in RAG systems and signifies the necessity for innovative solutions in effectively utilizing multi-document retrievals in large-scale LLMs. The availability of the datasets and code further facilitates ongoing exploration and model refinement by researchers within the field.

Ai Generate Text Spark Streamline Icon: https://streamlinehq.com

Paper Prompts

Sign up for free to create and run prompts on this paper using GPT-5.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Github Logo Streamline Icon: https://streamlinehq.com

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube