Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
184 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Chain-of-Discussion: A Multi-Model Framework for Complex Evidence-Based Question Answering (2402.16313v3)

Published 26 Feb 2024 in cs.CL and cs.AI

Abstract: Open-ended question answering requires models to find appropriate evidence to form wellreasoned, comprehensive and helpful answers. In practical applications, models also need to engage in extended discussions on potential scenarios closely relevant to the question. With augmentation of retrieval module, open-source LLMs can produce coherent answers often with different focuses, but are still sub-optimal in terms of reliable evidence selection and in-depth question analysis. In this paper, we propose a novel Chain-ofDiscussion framework to leverage the synergy among multiple open-source LLMs aiming to provide more correct and more comprehensive answers for open-ended QA, although they are not strong enough individually. Our experiments show that discussions among multiple LLMs play a vital role in enhancing the quality of answers.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (36)
  1. Qwen technical report.
  2. Baichuan. 2023. Baichuan 2: Open large-scale language models. arXiv preprint arXiv:2309.10305.
  3. Language models are few-shot learners.
  4. Chateval: Towards better LLM-based evaluators through multi-agent debate. In The Twelfth International Conference on Learning Representations.
  5. N-LTP: An open-source neural language technology platform for Chinese. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 42–49, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
  6. DeepSeek-AI. 2024. Deepseek llm: Scaling open-source language models with longtermism. arXiv preprint arXiv:2401.02954.
  7. Chain-of-verification reduces hallucination in large language models.
  8. Retrieval-augmented generation for large language models: A survey.
  9. A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions.
  10. Lawyer llama technical report.
  11. Atlas: Few-shot learning with retrieval augmented language models.
  12. TriviaQA: A large scale distantly supervised challenge dataset for reading comprehension. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1601–1611, Vancouver, Canada. Association for Computational Linguistics.
  13. Critiquellm: Scaling llm-as-critic for effective and explainable evaluation of large language model generation.
  14. Large language models are zero-shot reasoners. In Advances in Neural Information Processing Systems.
  15. Natural questions: A benchmark for question answering research. Transactions of the Association for Computational Linguistics, 7:452–466.
  16. Retrieval-augmented generation for knowledge-intensive nlp tasks. In Proceedings of the 34th International Conference on Neural Information Processing Systems, NIPS’20, Red Hook, NY, USA. Curran Associates Inc.
  17. Cmmlu: Measuring massive multitask language understanding in chinese.
  18. HaluEval: A large-scale hallucination evaluation benchmark for large language models. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 6449–6464, Singapore. Association for Computational Linguistics.
  19. G-eval: NLG evaluation using gpt-4 with better human alignment. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 2511–2522, Singapore. Association for Computational Linguistics.
  20. Roberta: A robustly optimized BERT pretraining approach. CoRR, abs/1907.11692.
  21. OpenAI. 2023. Gpt-4 technical report.
  22. Training language models to follow instructions with human feedback. In Advances in Neural Information Processing Systems.
  23. YaRN: Efficient context window extension of large language models. In The Twelfth International Conference on Learning Representations.
  24. Direct preference optimization: Your language model is secretly a reward model. In Thirty-seventh Conference on Neural Information Processing Systems.
  25. Know what you don’t know: Unanswerable questions for SQuAD. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 784–789, Melbourne, Australia. Association for Computational Linguistics.
  26. Language models are multilingual chain-of-thought reasoners. In The Eleventh International Conference on Learning Representations.
  27. Replug: Retrieval-augmented black-box language models.
  28. Evidentiality-aware retrieval for overcoming abstractiveness in open-domain question answering. In Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics. Association for Computational Linguistics.
  29. Llama 2: Open foundation and fine-tuned chat models.
  30. Interleaving retrieval with chain-of-thought reasoning for knowledge-intensive multi-step questions. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 10014–10037, Toronto, Canada. Association for Computational Linguistics.
  31. Boosting language models reasoning with chain-of-knowledge prompting.
  32. Chain of thought prompting elicits reasoning in large language models. In Advances in Neural Information Processing Systems.
  33. Generate rather than retrieve: Large language models are strong context generators. In The Eleventh International Conference on Learning Representations.
  34. STar: Bootstrapping reasoning with reasoning. In Advances in Neural Information Processing Systems.
  35. SAC33{}^{3}start_FLOATSUPERSCRIPT 3 end_FLOATSUPERSCRIPT: Reliable hallucination detection in black-box language models via semantic-aware cross-check consistency. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 15445–15458, Singapore. Association for Computational Linguistics.
  36. Least-to-most prompting enables complex reasoning in large language models. In The Eleventh International Conference on Learning Representations.
Citations (1)

Summary

We haven't generated a summary for this paper yet.