Probabilistic Tree-of-thought Reasoning for Answering Knowledge-intensive Complex Questions (2311.13982v1)
Abstract: LLMs are capable of answering knowledge-intensive complex questions with chain-of-thought (CoT) reasoning. However, they tend to generate factually incorrect reasoning steps when the required knowledge is not available or up-to-date in models' parameters. Recent works turn to retrieving external knowledge to augment CoT reasoning. Despite being promising, these chain-based methods suffer from: 1) Negative retrieval. Unnecessary or incorrect retrieval may mislead the reasoning; 2) Limited sight. Lacking the ability to look backward or forward, a local error in one step will propagate along the chain. In this paper, we propose a novel approach: Probabilistic Tree-of-thought Reasoning (ProbTree). First, LLMs translate a complex question into a query tree, in which each non-root node denotes a sub-question of its parent node. Then, probabilistic reasoning is conducted over the tree, by solving questions from leaf to root considering the confidence of both question decomposing and answering. During reasoning, for leaf nodes, LLMs choose a more confident answer from Closed-book QA that employs parametric knowledge and Open-book QA that employs retrieved external knowledge, thus eliminating the negative retrieval problem. For non-leaf nodes, with the hierarchical structure, LLMs have broader sights and are able to globally reason with the information from child nodes, thus recovering from local errors. The experiments on three Complex QA datasets under the open-domain setting show that our approach outperforms SOTA methods significantly, demonstrating the effect of probabilistic tree-of-thought reasoning.
- Answering complex logical queries on knowledge graphs via query computation tree optimization. ArXiv preprint, abs/2212.09567.
- Improving language models by retrieving from trillions of tokens. In International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA, volume 162 of Proceedings of Machine Learning Research, pages 2206–2240. PMLR.
- Language models are few-shot learners. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual.
- KQA Pro: A dataset with explicit compositional programs for complex question answering over knowledge base. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 6101–6119, Dublin, Ireland. Association for Computational Linguistics.
- Palm: Scaling language modeling with pathways. ArXiv preprint, abs/2204.02311.
- Constructing a multi-hop QA dataset for comprehensive evaluation of reasoning steps. In Proceedings of the 28th International Conference on Computational Linguistics, pages 6609–6625, Barcelona, Spain (Online). International Committee on Computational Linguistics.
- Few-shot learning with retrieval augmented language models. ArXiv preprint, abs/2208.03299.
- How can we know when language models know? on the calibration of language models for question answering. Transactions of the Association for Computational Linguistics, 9:962–977.
- Active retrieval augmented generation. ArXiv preprint, abs/2305.06983.
- Language models (mostly) know what they know. ArXiv preprint, abs/2207.05221.
- Demonstrate-search-predict: Composing retrieval and language models for knowledge-intensive nlp. ArXiv preprint, abs/2212.14024.
- Large language models are zero-shot reasoners. ArXiv preprint, abs/2205.11916.
- Answering complex questions over text by hybrid question parsing and execution. ArXiv preprint, abs/2305.07789.
- Training language models to follow instructions with human feedback. ArXiv preprint, abs/2203.02155.
- Measuring and narrowing the compositionality gap in language models. ArXiv preprint, abs/2210.03350.
- Tool learning with foundation models. ArXiv preprint, abs/2304.08354.
- Stephen E. Robertson and Hugo Zaragoza. 2009. The probabilistic relevance framework: Bm25 and beyond. Found. Trends Inf. Retr., 3:333–389.
- Enhancing retrieval-augmented large language models with iterative retrieval-generation synergy. ArXiv preprint, abs/2305.15294.
- Replug: Retrieval-augmented black-box language models. ArXiv preprint, abs/2301.12652.
- Interleaving retrieval with chain-of-thought reasoning for knowledge-intensive multi-step questions. ArXiv preprint, abs/2212.10509.
- MuSiQue: Multihop questions via single-hop question composition. Transactions of the Association for Computational Linguistics, 10:539–554.
- Ckgse: A prototype search engine for chinese knowledge graphs. Data Intelligence, 4:41–65.
- Self-consistency improves chain of thought reasoning in language models. ArXiv preprint, abs/2203.11171.
- Chain-of-thought prompting elicits reasoning in large language models. In Advances in Neural Information Processing Systems.
- HotpotQA: A dataset for diverse, explainable multi-hop question answering. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 2369–2380, Brussels, Belgium. Association for Computational Linguistics.
- Tree of thoughts: Deliberate problem solving with large language models. ArXiv preprint, abs/2305.10601.
- React: Synergizing reasoning and acting in language models. ArXiv preprint, abs/2210.03629.
- Answering questions by meta-reasoning over multiple chains of thought. ArXiv preprint, abs/2304.13007.
- Reasoning over hierarchical question decomposition tree for explainable question answering. ArXiv preprint, abs/2305.15056.
- Shulin Cao (23 papers)
- Jiajie Zhang (30 papers)
- Jiaxin Shi (53 papers)
- Xin Lv (38 papers)
- Zijun Yao (50 papers)
- Qi Tian (314 papers)
- Juanzi Li (144 papers)
- Lei Hou (127 papers)