CORAG: A Cost-Constrained Retrieval Optimization System for Retrieval-Augmented Generation (2411.00744v1)
Abstract: LLMs have demonstrated remarkable generation capabilities but often struggle to access up-to-date information, which can lead to hallucinations. Retrieval-Augmented Generation (RAG) addresses this issue by incorporating knowledge from external databases, enabling more accurate and relevant responses. Due to the context window constraints of LLMs, it is impractical to input the entire external database context directly into the model. Instead, only the most relevant information, referred to as chunks, is selectively retrieved. However, current RAG research faces three key challenges. First, existing solutions often select each chunk independently, overlooking potential correlations among them. Second, in practice the utility of chunks is non-monotonic, meaning that adding more chunks can decrease overall utility. Traditional methods emphasize maximizing the number of included chunks, which can inadvertently compromise performance. Third, each type of user query possesses unique characteristics that require tailored handling, an aspect that current approaches do not fully consider. To overcome these challenges, we propose a cost constrained retrieval optimization system CORAG for retrieval-augmented generation. We employ a Monte Carlo Tree Search (MCTS) based policy framework to find optimal chunk combinations sequentially, allowing for a comprehensive consideration of correlations among chunks. Additionally, rather than viewing budget exhaustion as a termination condition, we integrate budget constraints into the optimization of chunk combinations, effectively addressing the non-monotonicity of chunk utility.
- Applying generative AI with retrieval augmented generation to summarize and extract key clinical information from electronic health records. (2024), 104662.
- Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection. (2023).
- Peter Auer. 2002. Using confidence bounds for exploitation-exploration trade-offs. 3, Nov (2002), 397–422.
- Tom B Brown. 2020. Language models are few-shot learners. (2020).
- Rq-rag: Learning to refine queries for retrieval augmented generation. (2024).
- BGE M3-Embedding: Multi-Lingual, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation. arXiv:2402.03216 [cs.CL]
- WikiPassageQA: A Benchmark Collection for Research on Non-factoid Answer Passage Retrieval. abs/1805.03797 (2018). arXiv:1805.03797
- The Power of Noise: Redefining Retrieval for RAG Systems. 719–729.
- Goetz Graefe and William J McKenna. 1993. The volcano optimizer generator: Extensibility and efficient search. 209–218.
- A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions. arXiv:2311.05232 [cs.CL]
- LongRAG: Enhancing Retrieval-Augmented Generation with Long-context LLMs. arXiv:2406.15319 [cs.CL]
- Time-llm: Time series forecasting by reprogramming large language models. (2023).
- Internet-augmented language models through few-shot prompting for open-domain question answering. (2022).
- Retrieval-augmented generation for knowledge-intensive nlp tasks. 33 (2020), 9459–9474.
- A diversity-promoting objective function for neural conversation models. (2015).
- Structure-Aware Language Model Pretraining Improves Dense Retrieval on Structured Data. (2023).
- Optimizing llm queries in relational workloads. (2024).
- RA-ISF: Learning to Answer and Understand from Retrieval Augmentation via Iterative Self-Feedback. (2024).
- Fantastically ordered prompts and where to find them: Overcoming few-shot prompt order sensitivity. (2021).
- Crud-rag: A comprehensive chinese benchmark for retrieval-augmented generation of large language models. (2024).
- Query rewriting for retrieval-augmented large language models. (2023).
- Microsoft. 2024. GraphRAG. https://microsoft.github.io/graphrag/
- Webgpt: Browser-assisted question-answering with human feedback. (2021).
- MS MARCO: A Human Generated MAchine Reading COmprehension Dataset. abs/1611.09268 (2016). arXiv:1611.09268
- Large language models are effective text rankers with pairwise ranking prompting. (2023).
- Agentic Retrieval-Augmented Generation for Time Series Analysis. arXiv:2408.14484 [cs.AI]
- Simple BM25 extension to multiple weighted fields. 42–49.
- Improving Passage Retrieval with Zero-Shot Question Generation. arXiv:2204.07496 [cs.CL]
- RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval.
- Replug: Retrieval-augmented black-box language models. (2023).
- End-to-end training of multi-document reader and retriever for open-domain question answering. 34 (2021), 25968–25981.
- Large Language Models Encode Clinical Knowledge. arXiv:2212.13138 [cs.CL]
- Scaling Down, LiTting Up: Efficient Zero-Shot Listwise Reranking with Seq2seq Encoder-Decoder Models. (2023).
- Improvements to BM25 and language models examined. 58–65.
- RichRAG: Crafting Rich Responses for Multi-faceted Queries in Retrieval-Augmented Generation. (2024).
- Multimodal Query Suggestion with Multi-Agebm25g from Human Feedback. arXiv:2402.04867 [cs.IR]
- Recursively summarizing books with human feedback. (2021).
- Hallucination is Inevitable: An Innate Limitation of Large Language Models. arXiv:2401.11817 [cs.CL]
- Demonstration of DB-GPT: Next Generation Data Interaction System Empowered by Large Language Models. arXiv:2404.10209 [cs.AI]
- A Survey of Time Series Foundation Models: Generalizing Time Series Representation with Large Language Model. arXiv:2405.02358 [cs.LG]
- LIST: Learning to Index Spatio-Textual Data for Embedding based Spatial Keyword Queries. (2024).
- In Defense of RAG in the Era of Long-Context Language Models. arXiv:2409.01666 [cs.CL]
- RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs. (2024).
- Hamed Zamani and Michael Bendersky. 2024. Stochastic RAG: End-to-End Retrieval-Augmented Generation through Expected Utility Maximization. arXiv:2405.02816 [cs.CL]
- Model-enhanced vector index. 36 (2024).
- Retrieval-augmented generation for ai-generated content: A survey. (2024).
- Retrieval Augmented Generation (RAG) and Beyond: A Comprehensive Survey on How to Make your LLMs use External Data More Wisely. arXiv:2409.14924 [cs.CL]
- Chat2Data: An Interactive Data Analysis System with RAG, Vector Databases and LLMs. (2024).
- A learned query rewrite system using monte carlo tree search. 15, 1 (2021), 46–58.
- RankT5: Fine-Tuning T5 for Text Ranking with Ranking Losses. arXiv:2210.10634 [cs.IR]
- Open-source large language models are strong zero-shot query likelihood models for document ranking. (2023).
- Ziting Wang (9 papers)
- Haitao Yuan (14 papers)
- Wei Dong (106 papers)
- Gao Cong (54 papers)
- Feifei Li (47 papers)