Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

CORAG: A Cost-Constrained Retrieval Optimization System for Retrieval-Augmented Generation (2411.00744v1)

Published 1 Nov 2024 in cs.DB, cs.CL, and cs.IR

Abstract: LLMs have demonstrated remarkable generation capabilities but often struggle to access up-to-date information, which can lead to hallucinations. Retrieval-Augmented Generation (RAG) addresses this issue by incorporating knowledge from external databases, enabling more accurate and relevant responses. Due to the context window constraints of LLMs, it is impractical to input the entire external database context directly into the model. Instead, only the most relevant information, referred to as chunks, is selectively retrieved. However, current RAG research faces three key challenges. First, existing solutions often select each chunk independently, overlooking potential correlations among them. Second, in practice the utility of chunks is non-monotonic, meaning that adding more chunks can decrease overall utility. Traditional methods emphasize maximizing the number of included chunks, which can inadvertently compromise performance. Third, each type of user query possesses unique characteristics that require tailored handling, an aspect that current approaches do not fully consider. To overcome these challenges, we propose a cost constrained retrieval optimization system CORAG for retrieval-augmented generation. We employ a Monte Carlo Tree Search (MCTS) based policy framework to find optimal chunk combinations sequentially, allowing for a comprehensive consideration of correlations among chunks. Additionally, rather than viewing budget exhaustion as a termination condition, we integrate budget constraints into the optimization of chunk combinations, effectively addressing the non-monotonicity of chunk utility.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (51)
  1. Applying generative AI with retrieval augmented generation to summarize and extract key clinical information from electronic health records. (2024), 104662.
  2. Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection. (2023).
  3. Peter Auer. 2002. Using confidence bounds for exploitation-exploration trade-offs. 3, Nov (2002), 397–422.
  4. Tom B Brown. 2020. Language models are few-shot learners. (2020).
  5. Rq-rag: Learning to refine queries for retrieval augmented generation. (2024).
  6. BGE M3-Embedding: Multi-Lingual, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation. arXiv:2402.03216 [cs.CL]
  7. WikiPassageQA: A Benchmark Collection for Research on Non-factoid Answer Passage Retrieval. abs/1805.03797 (2018). arXiv:1805.03797
  8. The Power of Noise: Redefining Retrieval for RAG Systems. 719–729.
  9. Goetz Graefe and William J McKenna. 1993. The volcano optimizer generator: Extensibility and efficient search. 209–218.
  10. A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions. arXiv:2311.05232 [cs.CL]
  11. LongRAG: Enhancing Retrieval-Augmented Generation with Long-context LLMs. arXiv:2406.15319 [cs.CL]
  12. Time-llm: Time series forecasting by reprogramming large language models. (2023).
  13. Internet-augmented language models through few-shot prompting for open-domain question answering. (2022).
  14. Retrieval-augmented generation for knowledge-intensive nlp tasks. 33 (2020), 9459–9474.
  15. A diversity-promoting objective function for neural conversation models. (2015).
  16. Structure-Aware Language Model Pretraining Improves Dense Retrieval on Structured Data. (2023).
  17. Optimizing llm queries in relational workloads. (2024).
  18. RA-ISF: Learning to Answer and Understand from Retrieval Augmentation via Iterative Self-Feedback. (2024).
  19. Fantastically ordered prompts and where to find them: Overcoming few-shot prompt order sensitivity. (2021).
  20. Crud-rag: A comprehensive chinese benchmark for retrieval-augmented generation of large language models. (2024).
  21. Query rewriting for retrieval-augmented large language models. (2023).
  22. Microsoft. 2024. GraphRAG. https://microsoft.github.io/graphrag/
  23. Webgpt: Browser-assisted question-answering with human feedback. (2021).
  24. MS MARCO: A Human Generated MAchine Reading COmprehension Dataset. abs/1611.09268 (2016). arXiv:1611.09268
  25. Large language models are effective text rankers with pairwise ranking prompting. (2023).
  26. Agentic Retrieval-Augmented Generation for Time Series Analysis. arXiv:2408.14484 [cs.AI]
  27. Simple BM25 extension to multiple weighted fields. 42–49.
  28. Improving Passage Retrieval with Zero-Shot Question Generation. arXiv:2204.07496 [cs.CL]
  29. RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval.
  30. Replug: Retrieval-augmented black-box language models. (2023).
  31. End-to-end training of multi-document reader and retriever for open-domain question answering. 34 (2021), 25968–25981.
  32. Large Language Models Encode Clinical Knowledge. arXiv:2212.13138 [cs.CL]
  33. Scaling Down, LiTting Up: Efficient Zero-Shot Listwise Reranking with Seq2seq Encoder-Decoder Models. (2023).
  34. Improvements to BM25 and language models examined. 58–65.
  35. RichRAG: Crafting Rich Responses for Multi-faceted Queries in Retrieval-Augmented Generation. (2024).
  36. Multimodal Query Suggestion with Multi-Agebm25g from Human Feedback. arXiv:2402.04867 [cs.IR]
  37. Recursively summarizing books with human feedback. (2021).
  38. Hallucination is Inevitable: An Innate Limitation of Large Language Models. arXiv:2401.11817 [cs.CL]
  39. Demonstration of DB-GPT: Next Generation Data Interaction System Empowered by Large Language Models. arXiv:2404.10209 [cs.AI]
  40. A Survey of Time Series Foundation Models: Generalizing Time Series Representation with Large Language Model. arXiv:2405.02358 [cs.LG]
  41. LIST: Learning to Index Spatio-Textual Data for Embedding based Spatial Keyword Queries. (2024).
  42. In Defense of RAG in the Era of Long-Context Language Models. arXiv:2409.01666 [cs.CL]
  43. RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs. (2024).
  44. Hamed Zamani and Michael Bendersky. 2024. Stochastic RAG: End-to-End Retrieval-Augmented Generation through Expected Utility Maximization. arXiv:2405.02816 [cs.CL]
  45. Model-enhanced vector index. 36 (2024).
  46. Retrieval-augmented generation for ai-generated content: A survey. (2024).
  47. Retrieval Augmented Generation (RAG) and Beyond: A Comprehensive Survey on How to Make your LLMs use External Data More Wisely. arXiv:2409.14924 [cs.CL]
  48. Chat2Data: An Interactive Data Analysis System with RAG, Vector Databases and LLMs. (2024).
  49. A learned query rewrite system using monte carlo tree search. 15, 1 (2021), 46–58.
  50. RankT5: Fine-Tuning T5 for Text Ranking with Ranking Losses. arXiv:2210.10634 [cs.IR]
  51. Open-source large language models are strong zero-shot query likelihood models for document ranking. (2023).
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Ziting Wang (9 papers)
  2. Haitao Yuan (14 papers)
  3. Wei Dong (106 papers)
  4. Gao Cong (54 papers)
  5. Feifei Li (47 papers)

Summary

Overview of CORAG: A Cost-Constrained Retrieval Optimization System for Retrieval-Augmented Generation

The paper "CORAG: A Cost-Constrained Retrieval Optimization System for Retrieval-Augmented Generation" presents a methodological advancement in the field of Retrieval-Augmented Generation (RAG) by addressing three notable challenges: chunk correlation, non-monotonic utility of chunk inclusion, and the diversity of user queries. The authors propose the CORAG system which leverages a Monte Carlo Tree Search (MCTS)-based framework to improve retrieval efficiency in RAG systems, with a focus on optimizing chunk combinations under cost constraints.

Key Challenges in RAG Systems

  1. Chunk Correlation and Redundancy
    • Previous RAG approaches often treated each retrieved chunk independently or uniformly aggregated by clusters, leading to a disregard for inter-chunk correlations. This can introduce substantial redundancy, as multiple chunks may convey overlapping information. By overlooking the interrelations among chunks, RAG systems might miss assembling chunks into the most informative and necessary combinations to address specific user queries effectively.
  2. Non-Monotonicity of Chunk Utility
    • Traditional methods that maximize the number of chunks selected under the assumption that more information equates to better utility have been shown to be suboptimal. Excessive inclusion of chunks can introduce noise and dilute the informational purity due to utility non-monotonicity, where adding chunks might degrade the model's generation quality. Therefore, an approach that considers the diminishing returns of chunk addition is critical for enhancing RAG performance.
  3. Diversity in Query Characterization
    • Different user queries inherently have unique requirements when retrieving pertinent information. The lack of tailored retrieval strategies for each query type in existing systems leads to inefficiencies. For practical applications, RAG systems need to incorporate query-specific behaviors to enhance adaptability.

Contributions of CORAG

  • MCTS-Based Framework for Optimal Chunk Selection
    • CORAG introduces a novel MCTS approach to simultaneously handle the challenges of chunk correlation and non-monotonicity. The system employs MCTS to determine optimal chunk combinations and the order of chunks, allowing the inference process to account for inter-chunk relationships extensively and dynamically.
  • Integration of Cost Constraints in Retrieval Optimization
    • By embedding the cost considerations directly into the optimization routines, the system dynamically balances the trade-off between chunk inclusion benefits and computational costs, eschewing traditional budget exhaustion methods to terminate the retrieval process.
  • Configuration Agent for Query Adaptation
    • To address the diversity of query characteristics, CORAG involves a configuration agent that determines optimal MCTS configurations based on the query type. This agent utilizes contrastive learning to fine-tune and adapt retrieval behavior to the query domain, which enhances both the retrieval effectiveness and the efficiency of generated responses.

Experimental Results

CORAG demonstrates significant improvements in retrieval effectiveness and efficiency over existing baseline models in the experiments conducted. Notably, the framework achieved up to a 30% enhancement in performance metrics compared to traditional top-k selection and clustering methods. These results highlight CORAG's capability in managing long-context tasks while maintaining scalability and cost-effectiveness in computational resources.

Implications and Future Directions

The proposed CORAG system sets a precedent for addressing retrieval efficiency with a comprehensive approach that considers the intricacies of chunk selection beyond conventional methods. Its successful application to RAGs suggests potential extensions to other domains requiring retrieval optimizations where cost constraints and dynamic query handling are pertinent. Future developments could involve further enhancing MCTS strategies or embedding similar methodologies in other aspects of RAG components.

Overall, the CORAG system demonstrates a robust and well-documented evolution in RAG processes, offering a balanced model for enhancing retrieval accuracy against computational limits while catering to the diversity of user queries and contextual requirements.

X Twitter Logo Streamline Icon: https://streamlinehq.com