Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
51 tokens/sec
GPT-4o
60 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

REAR: A Relevance-Aware Retrieval-Augmented Framework for Open-Domain Question Answering (2402.17497v2)

Published 27 Feb 2024 in cs.CL and cs.IR
REAR: A Relevance-Aware Retrieval-Augmented Framework for Open-Domain Question Answering

Abstract: Considering the limited internal parametric knowledge, retrieval-augmented generation (RAG) has been widely used to extend the knowledge scope of LLMs. Despite the extensive efforts on RAG research, in existing methods, LLMs cannot precisely assess the relevance of retrieved documents, thus likely leading to misleading or even incorrect utilization of external knowledge (eg., retrieved documents). To address this issue, in this paper, we propose REAR, a RElevance-Aware Retrieval-augmented approach for open-domain question answering (QA). As the key motivation, we aim to enhance the self-awareness regarding the reliability of external knowledge for LLMs, so as to adaptively utilize external knowledge in RAG systems. Specially, we develop a novel architecture for LLM-based RAG systems, by incorporating a specially designed assessment module that precisely assesses the relevance of retrieved documents. Furthermore, we propose an improved training method based on bi-granularity relevance fusion and noise-resistant training. By combining the improvements in both architecture and training, our proposed REAR can better utilize external knowledge by effectively perceiving the relevance of retrieved documents. Experiments on four open-domain QA tasks show that REAR significantly outperforms previous a number of competitive RAG approaches. Our codes can be accessed at https://github.com/RUCAIBox/REAR.

Enhancing LLMs' Awareness of Source Relevance in Retrieval-Augmented Generation Systems

Introduction to REAR

In the field of retrieval-augmented generation (RAG) systems, a persistent challenge has been the effective utilization of external knowledge. Despite the profound capabilities exhibited by LLMs in various domains, their application in knowledge-intense tasks such as open-domain question answering (QA) starkly reveals limitations pertaining to their innate knowledge and real-time adaptability. These systems often struggle to discern the relevance of retrieved documents accurately, leading to potential misinformation in generated outputs. Addressing this challenge, the paper proposes REAR (RElevance-Aware Retrieval-augmented approach for open-domain QA), a novel framework designed to significantly boost LLMs' effectiveness in open-domain QA by enhancing source relevance awareness. This is achieved through innovative improvements in model architecture and training methodology, focusing on relevance assessment precision and adaptive knowledge utilization.

Architecture and Methodology

REAR introduces a specialized architecture incorporating a rank head explicitly designed for relevance assessment, integrated within the LLM framework. This distinct component allows for capturing and utilizing relevance signals from retrieved documents, underpinning the model's ability to discern the pertinence of external knowledge accurately. Moreover, the paper complements architectural advancements with a refined training approach, incorporating bi-granularity relevance fusion and noise-resistant training methods. These strategies collectively aim at improving the model's proficiency in processing fine-grained relevance cues and bolstering resistance against the noise present in retrieved documents. Through these dual avenues of enhancement, REAR emerges as a sophisticated framework capable of elevating the accuracy and reliability of content generated by LLMs in response to query inputs.

Experimental Results

The efficacy of REAR is substantiated through extensive experiments conducted across four open-domain QA tasks. The framework consistently outperformed several established RAG models, showcasing its superior capability in relevance assessment and external knowledge utility. Notably, REAR demonstrated a robust performance against irrelevant document interference, a significant advancement over conventional models. These experimental outcomes are a testament to the framework's ability to judiciously utilize retrieved documents. Detailed analyses further enlighten on specific components of REAR, elucidating the contributions of the rank head, bi-granularity relevance training, and noise-resistant training towards the observed performance enhancements.

Implications and Future Directions

The introduction of REAR paves a promising path towards resolving the longstanding challenge of effective external knowledge utilization in LLMs, specifically within the RAG context. By prioritizing the self-awareness of source relevance, REAR sets a foundation that future works can build upon to explore more granular levels of relevance discernment, such as passage or sentence level augmentation. Additionally, the applicability of the REAR framework in a wider array of knowledge-intensive tasks presents a fertile ground for further research, potentially extending beyond open-domain QA to other domains within the KILT benchmark.

Conclusion

In summary, REAR represents a significant stride towards enhancing the interplay between LLMs and external knowledge sources in RAG systems. Through its novel architecture and training approaches, REAR not only fosters a deeper understanding of document relevance but also ensures a more adept and noise-resistant utilization of external knowledge. Consequently, this framework heralds a notable improvement in the accuracy and reliability of responses generated by LLMs in open-domain QA tasks, marking a pivotal advancement in the field of AI research.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (59)
  1. Self-rag: Learning to retrieve, generate, and critique through self-reflection. arXiv preprint arXiv:2310.11511.
  2. Semantic parsing on freebase from question-answer pairs. In Proceedings of the 2013 conference on empirical methods in natural language processing, pages 1533–1544.
  3. Improving language models by retrieving from trillions of tokens. In International conference on machine learning, pages 2206–2240. PMLR.
  4. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901.
  5. Reading wikipedia to answer open-domain questions. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics.
  6. Danqi Chen and Wen-tau Yih. 2020. Open-domain question answering. ACL 2020, page 34.
  7. Electra: Pre-training text encoders as discriminators rather than generators. In International Conference on Learning Representations.
  8. The power of noise: Redefining retrieval for rag systems. arXiv preprint arXiv:2401.14887.
  9. Tri Dao. 2023. Flashattention-2: Faster attention with better parallelism and work partitioning. arXiv preprint arXiv:2307.08691.
  10. Flashattention: Fast and memory-efficient exact attention with io-awareness. Advances in Neural Information Processing Systems, 35:16344–16359.
  11. Glm: General language model pretraining with autoregressive blank infilling. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 320–335.
  12. Retrieval-augmented generation for large language models: A survey. arXiv preprint arXiv:2312.10997.
  13. Retrieval augmented language model pre-training. In International conference on machine learning, pages 3929–3938. PMLR.
  14. Fid-light: Efficient and effective retrieval-augmented text generation. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 1437–1447.
  15. Unsupervised dense information retrieval with contrastive learning.
  16. Gautier Izacard and Edouard Grave. 2021a. Distilling knowledge from reader to retriever for question answering. In ICLR 2021-9th International Conference on Learning Representations.
  17. Gautier Izacard and Edouard Grave. 2021b. Leveraging passage retrieval with generative models for open domain question answering. In EACL 2021-16th Conference of the European Chapter of the Association for Computational Linguistics, pages 874–880. Association for Computational Linguistics.
  18. Few-shot learning with retrieval augmented language models. arXiv preprint arXiv:2208.03299.
  19. Mistral 7b. arXiv preprint arXiv:2310.06825.
  20. Triviaqa: A large scale distantly supervised challenge dataset for reading comprehension. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1601–1611.
  21. Dense passage retrieval for open-domain question answering. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics.
  22. Omar Khattab and Matei Zaharia. 2020. Colbert: Efficient and effective passage search via contextualized late interaction over bert. In Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval, pages 39–48.
  23. Natural questions: a benchmark for question answering research. Transactions of the Association for Computational Linguistics, 7:453–466.
  24. Efficient memory management for large language model serving with pagedattention. In Proceedings of the 29th Symposium on Operating Systems Principles, pages 611–626.
  25. Latent retrieval for weakly supervised open domain question answering. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 6086–6096.
  26. Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems, 33:9459–9474.
  27. Halueval: A large-scale hallucination evaluation benchmark for large language models. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 6449–6464.
  28. The web can be your oyster for improving large language models. arXiv preprint arXiv:2305.10998.
  29. Ra-dit: Retrieval-augmented dual instruction tuning. arXiv preprint arXiv:2310.01352.
  30. Lost in the middle: How language models use long contexts. arXiv preprint arXiv:2307.03172.
  31. Sail: Search-augmented instruction learning. arXiv preprint arXiv:2305.15225.
  32. Fine-tuning llama for multi-stage text retrieval. arXiv preprint arXiv:2310.08319.
  33. When not to trust language models: Investigating effectiveness of parametric and non-parametric memories. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 9802–9822.
  34. Clara Meister and Ryan Cotterell. 2021. Language model evaluation beyond perplexity. arXiv preprint arXiv:2106.00085.
  35. RocketQA: An optimized training approach to dense passage retrieval for open-domain question answering. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 5835–5847, Online. Association for Computational Linguistics.
  36. Squad: 100,000+ questions for machine comprehension of text. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 2383–2392.
  37. In-context retrieval-augmented language models. arXiv preprint arXiv:2302.00083.
  38. Pair: Leveraging passage-centric similarity relation for improving dense passage retrieval. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pages 2173–2183.
  39. Rocketqav2: A joint training method for dense passage retrieval and passage re-ranking. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 2825–2835.
  40. Investigating the factual knowledge boundary of large language models with retrieval augmentation. arXiv preprint arXiv:2307.11019.
  41. Okapi at trec-3. Nist Special Publication Sp, 109:109.
  42. End-to-end training of neural retrievers for open-domain question answering. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 6648–6662.
  43. Colbertv2: Effective and efficient retrieval via lightweight late interaction. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 3715–3734.
  44. Large language models can be easily distracted by irrelevant context. In International Conference on Machine Learning, pages 31210–31227. PMLR.
  45. Is chatgpt good at search? investigating large language models as re-ranking agent. ArXiv, abs/2304.09542.
  46. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288.
  47. Attention is all you need. Advances in neural information processing systems, 30.
  48. Self-consistency improves chain of thought reasoning in language models. arXiv preprint arXiv:2203.11171.
  49. Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35:24824–24837.
  50. Approximate nearest neighbor negative contrastive learning for dense text retrieval. In International Conference on Learning Representations.
  51. Recomp: Improving retrieval-augmented lms with compression and selective augmentation. arXiv preprint arXiv:2310.04408.
  52. Baichuan 2: Open large-scale language models. arXiv preprint arXiv:2309.10305.
  53. Making retrieval-augmented language models robust to irrelevant context. arXiv preprint arXiv:2310.01558.
  54. Adversarial retriever-ranker for dense text retrieval. In International Conference on Learning Representations.
  55. Dense text retrieval based on pretrained language models: A survey. ACM Transactions on Information Systems, 42(4):1–60.
  56. A survey of large language models. arXiv preprint arXiv:2303.18223.
  57. Take a step back: evoking reasoning via abstraction in large language models. arXiv preprint arXiv:2310.06117.
  58. Simans: Simple ambiguous negatives sampling for dense text retrieval. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: Industry Track, pages 548–559.
  59. Retrieving and reading: A comprehensive survey on open-domain question answering. arXiv preprint arXiv:2101.00774.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Yuhao Wang (144 papers)
  2. Ruiyang Ren (18 papers)
  3. Junyi Li (92 papers)
  4. Wayne Xin Zhao (196 papers)
  5. Jing Liu (525 papers)
  6. Ji-Rong Wen (299 papers)
Citations (4)