Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
72 tokens/sec
GPT-4o
61 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Boosting Conversational Question Answering with Fine-Grained Retrieval-Augmentation and Self-Check (2403.18243v1)

Published 27 Mar 2024 in cs.AI

Abstract: Retrieval-Augmented Generation (RAG) aims to generate more reliable and accurate responses, by augmenting LLMs with the external vast and dynamic knowledge. Most previous work focuses on using RAG for single-round question answering, while how to adapt RAG to the complex conversational setting wherein the question is interdependent on the preceding context is not well studied. In this paper, we propose a conversation-level RAG approach, which incorporates fine-grained retrieval augmentation and self-check for conversational question answering (CQA). In particular, our approach consists of three components, namely conversational question refiner, fine-grained retriever and self-check based response generator, which work collaboratively for question understanding and relevant information acquisition in conversational settings. Extensive experiments demonstrate the great advantages of our approach over the state-of-the-art baselines. Moreover, we also release a Chinese CQA dataset with new features including reformulated question, extracted keyword, retrieved paragraphs and their helpfulness, which facilitates further researches in RAG enhanced CQA.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (41)
  1. Gpt-4 technical report. arXiv preprint arXiv:2303.08774 (2023).
  2. Topiocqa: Open-domain conversational question answering with topic switching. Transactions of the Association for Computational Linguistics 10 (2022), 468–483.
  3. Unsupervised approach for knowledge-graph creation from conversation: The use of intent supervision for slot filling. In 2021 International Joint Conference on Neural Networks (IJCNN). IEEE, 1–8.
  4. Open-Domain Question Answering Goes Conversational via Question Rewriting. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 520–534.
  5. Palm 2 technical report. arXiv preprint arXiv:2305.10403 (2023).
  6. Self-rag: Learning to retrieve, generate, and critique through self-reflection. arXiv preprint arXiv:2310.11511 (2023).
  7. Satanjeev Banerjee and Alon Lavie. 2005. METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. In Proceedings of the acl workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization. 65–72.
  8. A multitask, multilingual, multimodal evaluation of chatgpt on reasoning, hallucination, and interactivity. arXiv preprint arXiv:2302.04023 (2023).
  9. Improving language models by retrieving from trillions of tokens. In International conference on machine learning. PMLR, 2206–2240.
  10. Language models are few-shot learners. Advances in neural information processing systems 33 (2020), 1877–1901.
  11. Can You Unpack That? Learning to Rewrite Questions-in-Context. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 5918–5924.
  12. Trends in integration of knowledge and large language models: A survey and taxonomy of methods, benchmarks, and applications. arXiv preprint arXiv:2311.05876 (2023).
  13. On the robustness of dialogue history representation in conversational question answering: a comprehensive study and a new prompt-based method. Transactions of the Association for Computational Linguistics 11 (2023), 351–366.
  14. Hallucinations in large multilingual translation models. Transactions of the Association for Computational Linguistics 11 (2023), 1500–1517.
  15. Retrieval augmented language model pre-training. In International conference on machine learning. PMLR, 3929–3938.
  16. A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions. arXiv preprint arXiv:2311.05232 (2023).
  17. Challenges in building intelligent open-domain dialog systems. ACM Transactions on Information Systems (TOIS) 38, 3 (2020), 1–32.
  18. Few-shot learning with retrieval augmented language models. arXiv preprint arXiv:2208.03299 (2022).
  19. CAsT 2019: The conversational assistance track overview. In TREC.
  20. Survey of hallucination in natural language generation. Comput. Surveys 55, 12 (2023), 1–38.
  21. Bag of Tricks for Efficient Text Classification. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers. 427–431.
  22. Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems 33 (2020), 9459–9474.
  23. S2M: Converting Single-Turn to Multi-Turn Datasets for Conversational Question Answering. In ECAI 2023. IOS Press, 1365–1372.
  24. Halueval: A large-scale hallucination evaluation benchmark for large language models. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. 6449–6464.
  25. Chin-Yew Lin. 2004. Rouge: A package for automatic evaluation of summaries. In Text summarization branches out. 74–81.
  26. Knowledge diffusion for neural dialogue generation. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 1489–1498.
  27. WebGLM: Towards An Efficient Web-Enhanced Question Answering System with Human Preferences. arXiv:2306.07906 [cs.CL]
  28. Automated evaluation of written discourse coherence using GPT-4. In Proceedings of the 18th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2023). 394–403.
  29. Webgpt: Browser-assisted question-answering with human feedback. arXiv preprint arXiv:2112.09332 (2021).
  30. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems 35 (2022), 27730–27744.
  31. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics. 311–318.
  32. WebCPM: Interactive Web Search for Chinese Long-form Question Answering. arXiv preprint arXiv:2305.06849 (2023).
  33. SQuAD: 100,000+ Questions for Machine Comprehension of Text. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. 2383–2392.
  34. Investigating the factual knowledge boundary of large language models with retrieval augmentation. arXiv preprint arXiv:2307.11019 (2023).
  35. Retrieval Augmentation Reduces Hallucination in Conversation. In Findings of the Association for Computational Linguistics: EMNLP 2021. 3784–3803.
  36. LaMDA: Language Models for Dialog Applications. arXiv e-prints (2022), arXiv–2201.
  37. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971 (2023).
  38. Question rewriting for conversational question answering. In Proceedings of the 14th ACM international conference on web search and data mining. 355–363.
  39. Baichuan 2: Open large-scale language models. arXiv preprint arXiv:2309.10305 (2023).
  40. Chinese Open Instruction Generalist: A Preliminary Release. arXiv:2304.07987 [cs.CL]
  41. Opt: Open pre-trained transformer language models, 2022. URL https://arxiv. org/abs/2205.01068 ([n. d.]).
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Linhao Ye (3 papers)
  2. Zhikai Lei (9 papers)
  3. Jianghao Yin (2 papers)
  4. Qin Chen (57 papers)
  5. Jie Zhou (687 papers)
  6. Liang He (202 papers)
Citations (8)