Papers
Topics
Authors
Recent
Search
2000 character limit reached

M-RAG: Reinforcing Large Language Model Performance through Retrieval-Augmented Generation with Multiple Partitions

Published 26 May 2024 in cs.CL and cs.IR | (2405.16420v1)

Abstract: Retrieval-Augmented Generation (RAG) enhances LLMs by retrieving relevant memories from an external database. However, existing RAG methods typically organize all memories in a whole database, potentially limiting focus on crucial memories and introducing noise. In this paper, we introduce a multiple partition paradigm for RAG (called M-RAG), where each database partition serves as a basic unit for RAG execution. Based on this paradigm, we propose a novel framework that leverages LLMs with Multi-Agent Reinforcement Learning to optimize different language generation tasks explicitly. Through comprehensive experiments conducted on seven datasets, spanning three language generation tasks and involving three distinct LLM architectures, we confirm that M-RAG consistently outperforms various baseline methods, achieving improvements of 11%, 8%, and 12% for text summarization, machine translation, and dialogue generation, respectively.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (61)
  1. Phi-2: The surprising power of small language models. https://www.microsoft.com/en-us/research/blog/phi-2-the-surprising-power-of-small-language-models.
  2. Lingua: Addressing scenarios for live interpretation and automatic dubbing. In AMTA, pages 202–209.
  3. Self-rag: Learning to retrieve, generate, and critique through self-reflection. CoRR, abs/2310.11511.
  4. V. Blagojevi. 2023. Enhancing rag pipelines in haystack: Introducing diversityranker and lostinthemiddleranker. https://towardsdatascience.com/enhancing-rag-pipelines-in-haystack-45f14e2bc9f5.
  5. Language models are few-shot learners. NeurIPS, 33:1877–1901.
  6. Walking down the memory maze: Beyond context limit through interactive reading. CoRR, abs/2310.05029.
  7. UPRISE: universal prompt retrieval for improving zero-shot evaluation. In EMNLP, pages 12318–12337.
  8. Lift yourself up: Retrieval-augmented text generation with self memory. NeurIPS.
  9. Towards coherent and cohesive long-form text generation. CoRR.
  10. Promptagator: Few-shot dense retrieval from 8 examples. In ICLR.
  11. Retrieval-augmented generation for large language models: A survey. CoRR, abs/2312.10997.
  12. Filtered-diskann: Graph algorithms for approximate nearest neighbor search with filters. In WWW, pages 3406–3416.
  13. Search engine guided neural machine translation. In AAAI, pages 5133–5140. AAAI Press.
  14. Manu: a cloud native vector database management system. PVLDB, 15(12):3548–3561.
  15. A comprehensive survey on vector database: Storage and retrieval technique, challenge. CoRR.
  16. Simple and effective retrieve-edit-rerank text generation. In ACL, pages 2532–2538.
  17. Piotr Indyk and Rajeev Motwani. 1998. Approximate nearest neighbors: towards removing the curse of dimensionality. In STOC, pages 604–613.
  18. Way off-policy batch deep reinforcement learning of implicit human preferences in dialog. CoRR.
  19. Product quantization for nearest neighbor search. TPAMI, 33(1):117–128.
  20. Mistral 7b. CoRR, abs/2310.06825.
  21. Mixtral of experts. CoRR, abs/2401.04088.
  22. Llmlingua: Compressing prompts for accelerated inference of large language models. In EMNLP, pages 13358–13376.
  23. Dense passage retrieval for open-domain question answering. In EMNLP (1), pages 6769–6781.
  24. Can neural machine translation be improved with user feedback? CoRR.
  25. Carolin Lawrence and Stefan Riezler. 2018. Improving a neural semantic parser by counterfactual learning from human bandit feedback. CoRR.
  26. Retrieval-augmented generation for knowledge-intensive nlp tasks. NeurIPS, 33:9459–9474.
  27. Stylized dialogue generation with multi-pass dual learning. In NeurIPS, pages 28470–28481.
  28. A diversity-promoting objective function for neural conversation models. In HLT-NAACL, pages 110–119.
  29. Structure-aware language model pretraining improves dense retrieval on structured data. In ACL (Findings), pages 11560–11574.
  30. Dailydialog: A manually labelled multi-turn dialogue dataset. In IJCNLP(1), pages 986–995.
  31. Chin-Yew Lin. 2004. ROUGE: A package for automatic evaluation of summaries. In Text Summarization Branches Out, pages 74–81.
  32. RA-DIT: retrieval-augmented dual instruction tuning. CoRR, abs/2310.01352.
  33. SCATTER: selective context attentional scene text recognizer. In CVPR, pages 11959–11969.
  34. Query rewriting for retrieval-augmented large language models. EMNLP, pages 5303–5315.
  35. Yu A Malkov and Dmitry A Yashunin. 2018. Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs. TPAMI, 42(4):824–836.
  36. Approximate nearest neighbor algorithm based on navigable small world graphs. Information Systems, 45:61–68.
  37. Gemma: Open models based on gemini research and technology. CoRR, abs/2403.08295.
  38. Playing atari with deep reinforcement learning. CoRR.
  39. Webgpt: Browser-assisted question-answering with human feedback. CoRR, abs/2112.09332.
  40. Don’t give me the details, just the summary! topic-aware convolutional neural networks for extreme summarization. In EMNLP, pages 1797–1807.
  41. Text and code embeddings by contrastive pre-training. CoRR.
  42. Training language models to follow instructions with human feedback. NeurIPS, 35:27730–27744.
  43. Survey of vector database management systems. CoRR.
  44. Matt Post. 2018. A call for clarity in reporting BLEU scores. In WMT, pages 186–191.
  45. BIGPATENT: A large-scale dataset for abstractive and coherent summarization. In ACL (1), pages 2204–2213.
  46. Aleksandrs Slivkins et al. 2019. Introduction to multi-armed bandits. Foundations and Trends® in Machine Learning, 12(1-2):1–286.
  47. The jrc-acquis: A multilingual aligned parallel corpus with 20+ languages. In LREC, pages 2142–2147.
  48. Llama 2: Open foundation and fine-tuned chat models. CoRR, abs/2307.09288.
  49. Training data is more valuable than you think: A simple and effective method by retrieving from training data. In ACL, pages 3170–3179.
  50. Knowledgpt: Enhancing large language models with retrieval and storage access on knowledge bases. CoRR, abs/2308.11761.
  51. Multimodal query suggestion with multi-agent reinforcement learning from human feedback. In WWW, pages 1374–1385.
  52. Collectively simplifying trajectories in a database: A query accuracy driven approach. CoRR, abs/2311.11204.
  53. Error-bounded online trajectory simplification with multi-agent reinforcement learning. In KDD, pages 1758–1768.
  54. Recursively summarizing books with human feedback. CoRR.
  55. More is better: Enhancing open-domain dialogue generation via multi-source heterogeneous knowledge. In EMNLP, pages 2286–2300.
  56. KSAM: infusing multi-source knowledge into dialogue generation via knowledge source aware multi-head decoding. In ACL (Findings), pages 353–363.
  57. Lm-cocktail: Resilient tuning of language models via model merging. CoRR, abs/2311.13534.
  58. RECOMP: improving retrieval-augmented lms with compression and selective augmentation. CoRR, abs/2310.04408.
  59. Secure k𝑘kitalic_k nearest neighbors query for high-dimensional vectors in outsourced environments. IEEE TBD, 4(4):586–599.
  60. Towards coherent and engaging spoken dialog response generation using automatic conversation evaluators. CoRR.
  61. Open-source large language models are strong zero-shot query likelihood models for document ranking. In EMNLP (Findings), pages 8807–8817.
Citations (6)

Summary

  • The paper introduces a novel multi-partition retrieval framework that reduces data noise and enhances LLM performance.
  • It employs multi-agent reinforcement learning with dual agents—Agent-S for partition selection and Agent-R for iterative refinement—to optimize text generation.
  • Experimental results show 11% improvement in summarization, 8% in translation, and 12% in dialogue tasks, demonstrating its practical impact.

M-RAG: Reinforcing LLM Performance through Retrieval-Augmented Generation with Multiple Partitions

Introduction

The paper "M-RAG: Reinforcing LLM Performance through Retrieval-Augmented Generation with Multiple Partitions" (2405.16420) introduces a novel approach to enhance the performance of LLMs by structuring RAG processes across multiple database partitions. This methodology addresses the limitation of treating an entire database as a single entity, which can introduce noise and dilute focus on relevant data. By adopting a partition-based retrieval strategy, the paper suggests that the retrieval process can become more fine-grained, thereby optimizing the generative tasks of LLMs.

Methodology

The M-RAG framework is constructed around the concept of multiple database partitions, each serving as an individualized unit for conducting RAG tasks. This is paired with Multi-Agent Reinforcement Learning to facilitate enhanced language generation. The framework's efficacy is substantiated through experiments across diverse datasets encompassing three major language generation tasks: text summarization, machine translation, and dialogue generation. These tasks were tested on various LLM architectures, establishing the consistency and versatility of M-RAG.

Database Partitioning Strategy

The research proposes several partitioning strategies, including Randomization, Clustering, Indexing, and Category-based distribution, to effectively handle large databases. The choice of partitioning strategy is instrumental in achieving a more targeted and efficient retrieval. In essence, it ensures that the retrieval operation is localized to the most pertinent data partition, thereby reducing noise and enhancing retrieval precision (Figure 1). Figure 1

Figure 1

Figure 1

Figure 1

Figure 1: Comparison with database partitioning strategies for language generation tasks.

Multi-Agent Reinforcement Learning

In M-RAG, a two-pronged agent system, consisting of Agent-S and Agent-R, is employed. Agent-S identifies the optimal partition based on the query content, treating partition selection as a bandit problem. Agent-R is responsible for refining the retrieved memories, iteratively improving the quality of the generative output through reinforcement learning. This dual-agent setup is optimized via Deep Q-Networks (DQN) to ensure cumulative rewards are maximized, aligning the training objective closely with successful text generation outcomes.

Results and Implications

The experimental results underscore the substantial improvements afforded by the M-RAG framework. Notably, the improvements recorded in text summarization, machine translation, and dialogue generation tasks were 11%, 8%, and 12%, respectively, when compared to existing RAG methods. This performance leap is attributed to the precision in memory retrieval and the continuous refinement processes facilitated by the dual agents.

The theoretical implications of this research are significant as they offer a paradigm shift in how RAG is conceptualized. By moving away from monolithic database structures to partitioned formats, more nuanced and contextually relevant data can be retrieved by LLMs. Practically, the applications are vast, enhancing accuracy and efficiency in tasks ranging from customer service chatbots to sophisticated multilingual translation systems.

Conclusion

The introduction of M-RAG represents a pivotal advancement in the domain of LLM optimization. By leveraging multi-partition strategies and reinforcement learning, it challenges conventional RAG methodologies and sets a new benchmark for generative accuracy and efficiency. Future developments could focus on refining partitioning strategies further and exploring potential integrations with other advanced machine learning techniques to sustain and amplify these performance gains in diverse real-world applications.

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 4 tweets with 28 likes about this paper.