Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Not All Contexts Are Equal: Teaching LLMs Credibility-aware Generation (2404.06809v3)

Published 10 Apr 2024 in cs.CL

Abstract: The rapid development of LLMs has led to the widespread adoption of Retrieval-Augmented Generation (RAG), which integrates external knowledge to alleviate knowledge bottlenecks and mitigate hallucinations. However, the existing RAG paradigm inevitably suffers from the impact of flawed information introduced during the retrieval phrase, thereby diminishing the reliability and correctness of the generated outcomes. In this paper, we propose Credibility-aware Generation (CAG), a universally applicable framework designed to mitigate the impact of flawed information in RAG. At its core, CAG aims to equip models with the ability to discern and process information based on its credibility. To this end, we propose an innovative data transformation framework that generates data based on credibility, thereby effectively endowing models with the capability of CAG. Furthermore, to accurately evaluate the models' capabilities of CAG, we construct a comprehensive benchmark covering three critical real-world scenarios. Experimental results demonstrate that our model can effectively understand and utilize credibility for generation, significantly outperform other models with retrieval augmentation, and exhibit resilience against the disruption caused by noisy documents, thereby maintaining robust performance. Moreover, our model supports customized credibility, offering a wide range of potential applications.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (54)
  1. Qampari: An open-domain question answering benchmark for questions with many answers from multiple paragraphs.
  2. Palm 2 technical report.
  3. Knowledge-augmented large language models for personalized contextual query suggestion.
  4. Knowledge-augmented language model verification.
  5. Can retriever-augmented language models reason? the blame game between the retriever and the language model.
  6. Improving language models by retrieving from trillions of tokens. In International conference on machine learning, pages 2206–2240. PMLR.
  7. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901.
  8. Interactivity in human±computer interaction: a study of credibility, understanding, and in¯uence. Computers in Human Behavior.
  9. Canyu Chen and Kai Shu. 2023. Can llm-generated misinformation be detected?
  10. Benchmarking Large Language Models in Retrieval-Augmented Generation. ArXiv:2309.01431 [cs].
  11. Benchmarking large language models in retrieval-augmented generation.
  12. Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality.
  13. The effects of crowd worker biases in fact-checking tasks. Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency.
  14. ELI5: Long form question answering. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 3558–3567, Florence, Italy. Association for Computational Linguistics.
  15. Splade v2: Sparse lexical and expansion model for information retrieval. ArXiv, abs/2109.10086.
  16. Enabling large language models to generate text with citations. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 6465–6488.
  17. Constructing a multi-hop qa dataset for comprehensive evaluation of reasoning steps. arXiv preprint arXiv:2011.01060.
  18. Opt-iml: Scaling language model instruction meta learning through the lens of generalization.
  19. Gautier Izacard and Edouard Grave. 2021. Leveraging passage retrieval with generative models for open domain question answering. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pages 874–880, Online. Association for Computational Linguistics.
  20. Mistral 7b.
  21. PubMedQA: A dataset for biomedical research question answering. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 2567–2577, Hong Kong, China. Association for Computational Linguistics.
  22. Realtime qa: What’s the answer right now? arXiv preprint arXiv:2207.13332.
  23. Large language models are zero-shot reasoners. Advances in neural information processing systems, 35:22199–22213.
  24. Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems, 33:9459–9474.
  25. When not to trust language models: Investigating effectiveness of parametric and non-parametric memories. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 9802–9822, Toronto, Canada. Association for Computational Linguistics.
  26. Recent advances in natural language processing via large pre-trained language models: A survey. ACM Computing Surveys, 56(2):1–40.
  27. The influence of fact-checking is disputed! the role of party identification in processing and sharing fact-checked social media posts. American Behavioral Scientist.
  28. Gpt-4 technical report.
  29. On the risk of misinformation pollution with large language models. arXiv preprint arXiv:2305.13661.
  30. Check Your Facts and Try Again: Improving Large Language Models with External Knowledge and Automated Feedback. ArXiv:2302.12813 [cs].
  31. KILT: a benchmark for knowledge intensive language tasks. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 2523–2544, Online. Association for Computational Linguistics.
  32. SQuAD: 100,000+ questions for machine comprehension of text. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 2383–2392, Austin, Texas. Association for Computational Linguistics.
  33. Large language models can be easily distracted by irrelevant context. In International Conference on Machine Learning, pages 31210–31227. PMLR.
  34. Replug: Retrieval-augmented black-box language models. arXiv preprint arXiv:2301.12652.
  35. ASQA: Factoid questions meet long-form answers. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 8273–8288, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
  36. Fake news detectors are biased against texts generated by large language models.
  37. Improving temporal generalization of pre-trained language models with lexical semantic change.
  38. Large language models in medicine. Nature medicine, 29(8):1930–1940.
  39. Llama 2: Open foundation and fine-tuned chat models.
  40. NewsQA: A machine comprehension dataset. In Proceedings of the 2nd Workshop on Representation Learning for NLP, pages 191–200, Vancouver, Canada. Association for Computational Linguistics.
  41. Musique: Multihop questions via single-hop question composition. Transactions of the Association for Computational Linguistics, 10:539–554.
  42. Two stage transformer model for covid-19 fake news detection and fact checking.
  43. What evidence do language models find convincing?
  44. Survey on factuality in large language models: Knowledge, retrieval and domain-specificity. arXiv preprint arXiv:2310.07521.
  45. Learning to Filter Context for Retrieval-Augmented Generation. ArXiv:2311.08377 [cs].
  46. Adaptive Chameleon or Stubborn Sloth: Revealing the Behavior of Large Language Models in Knowledge Conflicts. ArXiv:2305.13300 [cs].
  47. Wikiqa: A challenge dataset for open-domain question answering. In Proceedings of the 2015 conference on empirical methods in natural language processing, pages 2013–2018.
  48. HotpotQA: A dataset for diverse, explainable multi-hop question answering. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 2369–2380, Brussels, Belgium. Association for Computational Linguistics.
  49. Hotpotqa: A dataset for diverse, explainable multi-hop question answering. In Conference on Empirical Methods in Natural Language Processing.
  50. Retrieve anything to augment large language models. arXiv preprint arXiv:2310.07554.
  51. Set the clock: Temporal alignment of pretrained language models.
  52. Retrieving and reading: A comprehensive survey on open-domain question answering. arXiv preprint arXiv:2101.00774.
  53. Exploring ai ethics of chatgpt: A diagnostic analysis. arXiv preprint arXiv:2301.12867.
  54. Can large language models transform computational social science? Computational Linguistics, pages 1–53.
Citations (6)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets