Plan, Verify and Switch: Integrated Reasoning with Diverse X-of-Thoughts (2310.14628v2)
Abstract: As LLMs have shown effectiveness with different prompting methods, such as Chain of Thought, Program of Thought, we find that these methods have formed a great complementarity to each other on math reasoning tasks. In this work, we propose XoT, an integrated problem solving framework by prompting LLMs with diverse reasoning thoughts. For each question, XoT always begins with selecting the most suitable method then executes each method iteratively. Within each iteration, XoT actively checks the validity of the generated answer and incorporates the feedback from external executors, allowing it to dynamically switch among different prompting methods. Through extensive experiments on 10 popular math reasoning datasets, we demonstrate the effectiveness of our proposed approach and thoroughly analyze the strengths of each module. Moreover, empirical results suggest that our framework is orthogonal to recent work that makes improvements on single reasoning methods and can further generalise to logical reasoning domain. By allowing method switching, XoT provides a fresh perspective on the collaborative integration of diverse reasoning thoughts in a unified framework. The code is available at https://github.com/tengxiaoliu/XoT.
- Language models are few-shot learners. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual.
- Program of thoughts prompting: Disentangling computation from reasoning for numerical reasoning tasks. CoRR, abs/2211.12588.
- Teaching large language models to self-debug. CoRR, abs/2304.05128.
- Palm: Scaling language modeling with pathways. CoRR, abs/2204.02311.
- Training verifiers to solve math word problems. CoRR, abs/2110.14168.
- Edward A. Feigenbaum and Julian Feldman. 1963. Computers and thought.
- Complexity-based prompting for multi-step reasoning. CoRR, abs/2210.00720.
- PAL: program-aided language models. CoRR, abs/2211.10435.
- FOLIO: natural language reasoning with first-order logic. CoRR, abs/2209.00840.
- Solving math word problems by combining language models with symbolic solvers. CoRR, abs/2304.09102.
- Measuring mathematical problem solving with the MATH dataset. In Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1, NeurIPS Datasets and Benchmarks 2021, December 2021, virtual.
- Carl Hewitt. 1969. PLANNER: A language for proving theorems in robots. In Proceedings of the 1st International Joint Conference on Artificial Intelligence, Washington, DC, USA, May 7-9, 1969, pages 295–302. William Kaufmann.
- Learning to solve arithmetic word problems with verb categorization. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014, October 25-29, 2014, Doha, Qatar, A meeting of SIGDAT, a Special Interest Group of the ACL, pages 523–533. ACL.
- Mathprompter: Mathematical reasoning using large language models. CoRR, abs/2303.05398.
- Large language models are zero-shot reasoners. In NeurIPS.
- Parsing algebraic word problems into equations. Trans. Assoc. Comput. Linguistics, 3:585–597.
- MAWPS: A math word problem repository. In NAACL HLT 2016, The 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego California, USA, June 12-17, 2016, pages 1152–1157. The Association for Computational Linguistics.
- Coderl: Mastering code generation through pretrained models and deep reinforcement learning. In NeurIPS.
- Solving quantitative reasoning problems with language models. In NeurIPS.
- On the advance of making language models better reasoners. CoRR, abs/2206.02336.
- Program induction by rationale generation: Learning to solve and explain algebraic word problems. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, ACL 2017, Vancouver, Canada, July 30 - August 4, Volume 1: Long Papers, pages 158–167. Association for Computational Linguistics.
- Chameleon: Plug-and-play compositional reasoning with large language models. CoRR, abs/2304.09842.
- A survey of deep learning for mathematical reasoning. CoRR, abs/2212.10535.
- Self-refine: Iterative refinement with self-feedback. CoRR, abs/2303.17651.
- OpenAI. 2023. GPT-4 technical report. CoRR, abs/2303.08774.
- Logic-lm: Empowering large language models with symbolic solvers for faithful logical reasoning. In Findings of the Association for Computational Linguistics: EMNLP 2023, Singapore, December 6-10, 2023, pages 3806–3824. Association for Computational Linguistics.
- Are NLP models really able to solve simple math word problems? In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2021, Online, June 6-11, 2021, pages 2080–2094. Association for Computational Linguistics.
- REFINER: reasoning feedback on intermediate representations. CoRR, abs/2304.01904.
- The art of SOCRATIC QUESTIONING: zero-shot multimodal reasoning with recursive thinking and self-questioning. CoRR, abs/2305.14999.
- Reasoning with language model prompting: A survey. CoRR, abs/2212.09597.
- Subhro Roy and Dan Roth. 2015. Solving general arithmetic word problems. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, Lisbon, Portugal, September 17-21, 2015, pages 1743–1752. The Association for Computational Linguistics.
- Reasoning about quantities in natural language. Trans. Assoc. Comput. Linguistics, 3:1–13.
- Reflexion: an autonomous agent with dynamic memory and self-reflection. CoRR, abs/2303.11366.
- Artificial General Intelligence - 9th International Conference, AGI 2016, New York, NY, USA, July 16-19, 2016, Proceedings, volume 9782 of Lecture Notes in Computer Science. Springer.
- Llama: Open and efficient foundation language models. CoRR, abs/2302.13971.
- Llama 2: Open foundation and fine-tuned chat models. CoRR, abs/2307.09288.
- Danqing Wang and Lei Li. 2023. Learn from mistakes through cooperative interaction with study assistant. CoRR, abs/2305.13829.
- Self-consistency improves chain of thought reasoning in language models. CoRR, abs/2203.11171.
- Deep neural solver for math word problems. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, Copenhagen, Denmark, September 9-11, 2017, pages 845–854. Association for Computational Linguistics.
- Chain-of-thought prompting elicits reasoning in large language models. In NeurIPS.
- Tree of thoughts: Deliberate problem solving with large language models. CoRR, abs/2305.10601.
- Automatic model selection with large language models for reasoning. CoRR, abs/2305.14333.
- Progressive-hint prompting improves reasoning in large language models. CoRR, abs/2304.09797.
- Least-to-most prompting enables complex reasoning in large language models. CoRR, abs/2205.10625.