Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
51 tokens/sec
GPT-4o
60 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Unsupervised Information Refinement Training of Large Language Models for Retrieval-Augmented Generation (2402.18150v2)

Published 28 Feb 2024 in cs.CL, cs.AI, and cs.IR

Abstract: Retrieval-augmented generation (RAG) enhances LLMs by incorporating additional information from retrieval. However, studies have shown that LLMs still face challenges in effectively using the retrieved information, even ignoring it or being misled by it. The key reason is that the training of LLMs does not clearly make LLMs learn how to utilize input retrieved texts with varied quality. In this paper, we propose a novel perspective that considers the role of LLMs in RAG as ``Information Refiner'', which means that regardless of correctness, completeness, or usefulness of retrieved texts, LLMs can consistently integrate knowledge within the retrieved texts and model parameters to generate the texts that are more concise, accurate, and complete than the retrieved texts. To this end, we propose an information refinement training method named InFO-RAG that optimizes LLMs for RAG in an unsupervised manner. InFO-RAG is low-cost and general across various tasks. Extensive experiments on zero-shot prediction of 11 datasets in diverse tasks including Question Answering, Slot-Filling, LLMing, Dialogue, and Code Generation show that InFO-RAG improves the performance of LLaMA2 by an average of 9.39\% relative points. InFO-RAG also shows advantages in in-context learning and robustness of RAG.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (51)
  1. Self-rag: Learning to retrieve, generate, and critique through self-reflection. arXiv preprint arXiv:2310.11511.
  2. Semantic parsing on freebase from question-answer pairs. In Proceedings of the EMNLP 2013, pages 1533–1544.
  3. Improving language models by retrieving from trillions of tokens. In International conference on machine learning, pages 2206–2240. PMLR.
  4. Language models are few-shot learners.
  5. Skeleton-to-response: Dialogue generation guided by retrieval memory. arXiv preprint arXiv:1809.05296.
  6. Retrieval-guided dialogue response generation via a matching-to-generation framework. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 1866–1875.
  7. Benchmarking large language models in retrieval-augmented generation. arXiv preprint arXiv:2309.01431.
  8. Dola: Decoding by contrasting layers improves factuality in large language models. arXiv preprint arXiv:2309.03883.
  9. Regavae: A retrieval-augmented gaussian mixture variational auto-encoder for language modeling. arXiv preprint arXiv:2310.10567.
  10. Chain-of-verification reduces hallucination in large language models.
  11. Wizard of wikipedia: Knowledge-powered conversational agents. arXiv preprint arXiv:1811.01241.
  12. T-rex: A large scale alignment of natural language with knowledge base triples. In Proceedings of LREC 2018.
  13. Eli5: Long form question answering. arXiv preprint arXiv:1907.09190.
  14. Retrieval augmented language model pre-training. In International conference on machine learning, pages 3929–3938. PMLR.
  15. Measuring massive multitask language understanding. arXiv preprint arXiv:2009.03300.
  16. Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685.
  17. Codesearchnet challenge: Evaluating the state of semantic code search. arXiv preprint arXiv:1909.09436.
  18. Mapping language to code in programmatic context. arXiv preprint arXiv:1808.09588.
  19. Few-shot learning with retrieval augmented language models. arXiv preprint arXiv:2208.03299.
  20. Dense passage retrieval for open-domain question answering. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 6769–6781.
  21. Demonstrate-search-predict: Composing retrieval and language models for knowledge-intensive nlp. arXiv preprint arXiv:2212.14024.
  22. Natural questions: a benchmark for question answering research. Transactions of the Association for Computational Linguistics, 7:453–466.
  23. Zero-shot relation extraction via reading comprehension. arXiv preprint arXiv:1706.04115.
  24. Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems, 33:9459–9474.
  25. An empirical study of catastrophic forgetting in large language models during continual fine-tuning. arXiv preprint arXiv:2308.08747.
  26. Stephen Merity. 2016. The wikitext long term dependency language modeling dataset. Salesforce Metamind, 9.
  27. Tomáš Mikolov et al. 2012. Statistical language models based on neural networks. Presentation at Google, Mountain View, 2nd April, 80(26).
  28. Retrieval augmented code generation and summarization. arXiv preprint arXiv:2108.11601.
  29. Check your facts and try again: Improving large language models with external knowledge and automated feedback. arXiv preprint arXiv:2302.12813.
  30. Kilt: a benchmark for knowledge intensive language tasks. arXiv preprint arXiv:2009.02252.
  31. Measuring and narrowing the compositionality gap in language models.
  32. Improving language understanding by generative pre-training.
  33. In-context retrieval-augmented language models. arXiv preprint arXiv:2302.00083.
  34. Investigating the factual knowledge boundary of large language models with retrieval augmentation. arXiv preprint arXiv:2307.11019.
  35. Codebleu: a method for automatic evaluation of code synthesis. arXiv preprint arXiv:2009.10297.
  36. Colbertv2: Effective and efficient retrieval via lightweight late interaction.
  37. Toolformer: Language models can teach themselves to use tools. arXiv preprint arXiv:2302.04761.
  38. Replug: Retrieval-augmented black-box language models. arXiv preprint arXiv:2301.12652.
  39. Rumor detection on social media with graph adversarial contrastive learning. In Proceedings of the WWW 2022, pages 2789–2797.
  40. Beir: A heterogenous benchmark for zero-shot evaluation of information retrieval models. arXiv preprint arXiv:2104.08663.
  41. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971.
  42. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288.
  43. Interleaving retrieval with chain-of-thought reasoning for knowledge-intensive multi-step questions. arXiv preprint arXiv:2212.10509.
  44. Musique: Multihop questions via single-hop question composition. Transactions of the Association for Computational Linguistics, 10:539–554.
  45. Self-knowledge guided retrieval augmentation for large language models. arXiv preprint arXiv:2310.05002.
  46. Search-in-the-chain: Towards the accurate, credible and traceable content generation for complex knowledge-intensive tasks. arXiv preprint arXiv:2304.14732.
  47. List-aware reranking-truncation joint model for search and retrieval-augmented generation. arXiv preprint arXiv:2402.02764.
  48. Hotpotqa: A dataset for diverse, explainable multi-hop question answering. arXiv preprint arXiv:1809.09600.
  49. React: Synergizing reasoning and acting in language models. arXiv preprint arXiv:2210.03629.
  50. Making retrieval-augmented language models robust to irrelevant context. arXiv preprint arXiv:2310.01558.
  51. Chain-of-note: Enhancing robustness in retrieval-augmented language models.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Shicheng Xu (36 papers)
  2. Liang Pang (94 papers)
  3. Mo Yu (117 papers)
  4. Fandong Meng (174 papers)
  5. Huawei Shen (119 papers)
  6. Xueqi Cheng (274 papers)
  7. Jie Zhou (687 papers)
Citations (5)
Youtube Logo Streamline Icon: https://streamlinehq.com