Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Answering Questions by Meta-Reasoning over Multiple Chains of Thought (2304.13007v4)

Published 25 Apr 2023 in cs.CL and cs.AI

Abstract: Modern systems for multi-hop question answering (QA) typically break questions into a sequence of reasoning steps, termed chain-of-thought (CoT), before arriving at a final answer. Often, multiple chains are sampled and aggregated through a voting mechanism over the final answers, but the intermediate steps themselves are discarded. While such approaches improve performance, they do not consider the relations between intermediate steps across chains and do not provide a unified explanation for the predicted answer. We introduce Multi-Chain Reasoning (MCR), an approach which prompts LLMs to meta-reason over multiple chains of thought, rather than aggregating their answers. MCR examines different reasoning chains, mixes information between them and selects the most relevant facts in generating an explanation and predicting the answer. MCR outperforms strong baselines on 7 multi-hop QA datasets. Moreover, our analysis reveals that MCR explanations exhibit high quality, enabling humans to verify its answers.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (55)
  1. Towards a human-like open-domain chatbot.
  2. Feverous: Fact extraction and verification over unstructured and structured information. ArXiv, abs/2106.05707.
  3. Language models are few-shot learners. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual.
  4. Teaching large language models to self-debug.
  5. Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality.
  6. Palm: Scaling language modeling with pathways. ArXiv, abs/2204.02311.
  7. Training verifiers to solve math word problems. ArXiv preprint, abs/2110.14168.
  8. Selection-inference: Exploiting large language models for interpretable logical reasoning. ArXiv, abs/2205.09712.
  9. Towards a human-like open-domain chatbot. ArXiv, abs/2001.09977.
  10. Complexity-based prompting for multi-step reasoning. ArXiv, abs/2210.00720.
  11. Rarr: Researching and revising what language models say, using language models.
  12. Did aristotle use a laptop? a question answering benchmark with implicit reasoning strategies. Transactions of the Association for Computational Linguistics, 9:346–361.
  13. Rethinking with retrieval: Faithful large language model inference.
  14. Jie Huang and Kevin Chen-Chuan Chang. 2022. Towards reasoning in large language models: A survey.
  15. Alon Jacovi and Yoav Goldberg. 2020. Towards faithfully interpretable NLP systems: How should we define and evaluate faithfulness? In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 4198–4205, Online. Association for Computational Linguistics.
  16. Language models (mostly) know what they know. ArXiv, abs/2207.05221.
  17. How much coffee was consumed during EMNLP 2019? fermi problems: A new reasoning challenge for AI. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 7318–7328, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
  18. Dense passage retrieval for open-domain question answering. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 6769–6781, Online. Association for Computational Linguistics.
  19. Demonstrate-search-predict: Composing retrieval and language models for knowledge-intensive nlp. ArXiv, abs/2212.14024.
  20. Omar Khattab and Matei Zaharia. 2020. Colbert: Efficient and effective passage search via contextualized late interaction over BERT. In Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval, SIGIR 2020, Virtual Event, China, July 25-30, 2020, pages 39–48. ACM.
  21. Decomposed prompting: A modular approach for solving complex tasks.
  22. Large language models are zero-shot reasoners. In ICML 2022 Workshop on Knowledge Retrieval and Language Models.
  23. Internet-augmented language models through few-shot prompting for open-domain question answering.
  24. On the advance of making language models better reasoners. ArXiv, abs/2206.02336.
  25. Chin-Yew Lin. 2004. ROUGE: A package for automatic evaluation of summaries. In Text Summarization Branches Out, pages 74–81, Barcelona, Spain. Association for Computational Linguistics.
  26. A survey of deep learning for mathematical reasoning.
  27. Self-refine: Iterative refinement with self-feedback.
  28. Show your work: Scratchpads for intermediate computation with language models.
  29. Art: Automatic multi-step reasoning and tool-use for large language models. ArXiv, abs/2303.09014.
  30. Refiner: Reasoning feedback on intermediate representations.
  31. Measuring and narrowing the compositionality gap in language models. ArXiv, abs/2210.03350.
  32. Evaluating explanations: How much do explanations from the teacher aid students? Transactions of the Association for Computational Linguistics, 10:359–375.
  33. Reasoning with language model prompting: A survey.
  34. In-context retrieval-augmented language models. Transactions of the Association for Computational Linguistics.
  35. ColBERTv2: Effective and efficient retrieval via lightweight late interaction. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 3715–3734, Seattle, United States. Association for Computational Linguistics.
  36. Reflexion: Language agents with verbal reinforcement learning.
  37. Recitation-augmented language models. ICLR.
  38. QuaRTz: An open-domain dataset of qualitative relationship questions. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 5941–5946, Hong Kong, China. Association for Computational Linguistics.
  39. Entailer: Answering questions with faithful and truthful chains of reasoning.
  40. Alon Talmor and Jonathan Berant. 2018. The web as a knowledge-base for answering complex questions. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 641–651, New Orleans, Louisiana. Association for Computational Linguistics.
  41. Multimodalqa: complex question answering over text, tables and images. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net.
  42. Stanford alpaca: An instruction-following llama model. https://github.com/tatsu-lab/stanford_alpaca.
  43. Llama: Open and efficient foundation language models. ArXiv, abs/2302.13971.
  44. Interleaving retrieval with chain-of-thought reasoning for knowledge-intensive multi-step questions.
  45. MuSiQue: Multihop questions via single-hop question composition. Transactions of the Association for Computational Linguistics, 10:539–554.
  46. Self-consistency improves chain of thought reasoning in language models. In The Eleventh International Conference on Learning Representations.
  47. Chain of thought prompting elicits reasoning in large language models. NeurIPS.
  48. Constructing datasets for multi-hop reading comprehension across documents. Transactions of the Association for Computational Linguistics, 6:287–302.
  49. Break it down: A question understanding benchmark. Transactions of the Association for Computational Linguistics, 8:183–198.
  50. HotpotQA: A dataset for diverse, explainable multi-hop question answering. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 2369–2380, Brussels, Belgium. Association for Computational Linguistics.
  51. Tree of thoughts: Deliberate problem solving with large language models.
  52. React: Synergizing reasoning and acting in language models. ArXiv preprint, abs/2210.03629.
  53. Making retrieval-augmented language models robust to irrelevant context.
  54. STar: Bootstrapping reasoning with reasoning. In Advances in Neural Information Processing Systems.
  55. Least-to-most prompting enables complex reasoning in large language models. ArXiv, abs/2205.10625.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Ori Yoran (13 papers)
  2. Tomer Wolfson (11 papers)
  3. Ben Bogin (22 papers)
  4. Uri Katz (5 papers)
  5. Daniel Deutch (23 papers)
  6. Jonathan Berant (107 papers)
Citations (79)
X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets