Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

BIDER: Bridging Knowledge Inconsistency for Efficient Retrieval-Augmented LLMs via Key Supporting Evidence (2402.12174v2)

Published 19 Feb 2024 in cs.CL

Abstract: Retrieval-augmented LLMs have demonstrated efficacy in knowledge-intensive tasks such as open-domain QA, addressing inherent challenges in knowledge update and factual inadequacy. However, inconsistencies between retrieval knowledge and the necessary knowledge for LLMs, leading to a decline in LLM's answer quality. This paper introduces BIDER, an approach that refines retrieval documents into Key Supporting Evidence (KSE) through knowledge synthesis, supervised fine-tuning (SFT), and preference alignment. We train BIDER by learning from crafting KSE, while maximizing its output to align with LLM's information acquisition preferences through reinforcement learning. Evaluations across five datasets show BIDER boosts LLMs' answer quality by 7% while reducing input content length in retrieval documents by 80%, outperforming existing methods. The proposed KSE simulation effectively equips LLMs with essential information for accurate question answering.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (52)
  1. Leancontext: Cost-efficient domain-specific question answering using llms. arXiv preprint arXiv:2309.00841.
  2. Longbench: A bilingual, multitask benchmark for long context understanding. arXiv preprint arXiv:2308.14508.
  3. A multitask, multilingual, multimodal evaluation of chatgpt on reasoning, hallucination, and interactivity.
  4. Improving language models by retrieving from trillions of tokens. In International Conference on Machine Learning, pages 2206–2240. PMLR.
  5. Walking down the memory maze: Beyond context limit through interactive reading. arXiv preprint arXiv:2310.05029.
  6. Gpt3.int8(): 8-bit matrix multiplication for transformers at scale. In Advances in Neural Information Processing Systems, volume 35, pages 30318–30332. Curran Associates, Inc.
  7. Wizard of Wikipedia: Knowledge-powered conversational agents. In International Conference on Learning Representations.
  8. Chainerrl: A deep reinforcement learning library. Journal of Machine Learning Research, 22(77):1–14.
  9. Semantic compression with large language models. arXiv preprint arXiv:2304.12512.
  10. Incorporating copying mechanism in sequence-to-sequence learning. arXiv preprint arXiv:1603.06393.
  11. REALM: Retrieval-augmented language model pre-training. In International Conference on Machine Learning. JMLR.org.
  12. Atlas: Few-shot learning with retrieval augmented language models.
  13. LongLLMLingua: Accelerating and enhancing llms in long context scenarios via prompt compression. ArXiv preprint, abs/2310.06839.
  14. Active retrieval augmented generation. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 7969–7992, Singapore. Association for Computational Linguistics.
  15. TriviaQA: A large scale distantly supervised challenge dataset for reading comprehension. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1601–1611, Vancouver, Canada. Association for Computational Linguistics.
  16. Dense passage retrieval for open-domain question answering. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 6769–6781, Online. Association for Computational Linguistics.
  17. Natural questions: A benchmark for question answering research. Transactions of the Association for Computational Linguistics, 7:452–466.
  18. Cqsumdp: A chatgpt-annotated resource for query-focused abstractive summarization based on debatepedia. arXiv preprint arXiv:2305.06147.
  19. BART: denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. CoRR, abs/1910.13461.
  20. Retrieval-Augmented Generation for knowledge-intensive NLP tasks. In Advances in Neural Information Processing Systems, volume 33, pages 9459–9474.
  21. Yucheng Li. 2023. Unlocking context constraints of llms: Enhancing context efficiency of llms with self-information-based content filtering. arXiv preprint arXiv:2304.12102.
  22. Pyserini: A Python toolkit for reproducible information retrieval research with sparse and dense representations. In Proceedings of the 44th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2021), pages 2356–2362.
  23. Lost in the middle: How language models use long contexts. arXiv preprint arXiv:2307.03172.
  24. Yang Liu. 2019. Fine-tune bert for extractive summarization. arXiv preprint arXiv:1903.10318.
  25. Ilya Loshchilov and Frank Hutter. 2019. Decoupled weight decay regularization. In International Conference on Learning Representations.
  26. When not to trust language models: Investigating effectiveness and limitations of parametric and non-parametric memories. arXiv preprint.
  27. Gpt-4 technical report.
  28. Training language models to follow instructions with human feedback. In Advances in Neural Information Processing Systems, volume 35, pages 27730–27744. Curran Associates, Inc.
  29. How context affects language models’ factual predictions. CoRR, abs/2005.04611.
  30. KILT: a benchmark for knowledge intensive language tasks. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 2523–2544, Online. Association for Computational Linguistics.
  31. Measuring and narrowing the compositionality gap in language models. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 5687–5711, Singapore. Association for Computational Linguistics.
  32. Nils Reimers and Iryna Gurevych. 2019. Sentence-bert: Sentence embeddings using siamese bert-networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics.
  33. High-dimensional continuous control using generalized advantage estimation. In 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May 2-4, 2016, Conference Track Proceedings.
  34. Proximal policy optimization algorithms. ArXiv, abs/1707.06347.
  35. Get to the point: Summarization with pointer-generator networks. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1073–1083, Vancouver, Canada. Association for Computational Linguistics.
  36. Large language models can be easily distracted by irrelevant context. arXiv preprint arXiv:2302.00093.
  37. Replug: Retrieval-augmented black-box language models. arXiv preprint arXiv:2301.12652.
  38. Reflexion: Language agents with verbal reinforcement learning.
  39. Learning to summarize with human feedback. In Advances in Neural Information Processing Systems, volume 33, pages 3008–3021. Curran Associates, Inc.
  40. FEVER: a large-scale dataset for fact extraction and VERification. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 809–819, New Orleans, Louisiana. Association for Computational Linguistics.
  41. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288.
  42. Text embeddings by weakly-supervised contrastive pre-training. arXiv preprint arXiv:2212.03533.
  43. SimLM: Pre-training with representation bottleneck for dense passage retrieval. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2244–2258, Toronto, Canada. Association for Computational Linguistics.
  44. Self-knowledge guided retrieval augmentation for large language models. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 10303–10315, Singapore. Association for Computational Linguistics.
  45. Transformers: State-of-the-art natural language processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 38–45, Online. Association for Computational Linguistics.
  46. C-pack: Packaged resources to advance general chinese embedding.
  47. Recomp: Improving retrieval-augmented lms with compression and selective augmentation. arXiv preprint arXiv:2310.04408.
  48. PRCA: Fitting black-box large language models for retrieval question answering via pluggable reward-driven contextual adapter. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 5364–5375, Singapore. Association for Computational Linguistics.
  49. HotpotQA: A dataset for diverse, explainable multi-hop question answering. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 2369–2380, Brussels, Belgium. Association for Computational Linguistics.
  50. ReAct: Synergizing reasoning and acting in language models. In International Conference on Learning Representations.
  51. Retrieve anything to augment large language models.
  52. Fine-tuning language models from human preferences. ArXiv, abs/1909.08593.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Jiajie Jin (14 papers)
  2. Yutao Zhu (63 papers)
  3. Yujia Zhou (34 papers)
  4. Zhicheng Dou (113 papers)
Citations (14)
X Twitter Logo Streamline Icon: https://streamlinehq.com