Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Plan, Verify and Switch: Integrated Reasoning with Diverse X-of-Thoughts (2310.14628v2)

Published 23 Oct 2023 in cs.CL

Abstract: As LLMs have shown effectiveness with different prompting methods, such as Chain of Thought, Program of Thought, we find that these methods have formed a great complementarity to each other on math reasoning tasks. In this work, we propose XoT, an integrated problem solving framework by prompting LLMs with diverse reasoning thoughts. For each question, XoT always begins with selecting the most suitable method then executes each method iteratively. Within each iteration, XoT actively checks the validity of the generated answer and incorporates the feedback from external executors, allowing it to dynamically switch among different prompting methods. Through extensive experiments on 10 popular math reasoning datasets, we demonstrate the effectiveness of our proposed approach and thoroughly analyze the strengths of each module. Moreover, empirical results suggest that our framework is orthogonal to recent work that makes improvements on single reasoning methods and can further generalise to logical reasoning domain. By allowing method switching, XoT provides a fresh perspective on the collaborative integration of diverse reasoning thoughts in a unified framework. The code is available at https://github.com/tengxiaoliu/XoT.

Integrated Reasoning with Diverse X-of-Thoughts

The paper "Plan, Verify and Switch: Integrated Reasoning with Diverse X-of-Thoughts" introduces a novel framework to enhance the problem-solving capabilities of LLMs by leveraging the complementarity of distinct prompting strategies. The authors propose the XoT framework, an integrated system that orchestrates different reasoning methods, particularly for mathematical reasoning tasks. It strategically selects and switches between these methods, using both active and passive verification mechanisms to optimize performance.

Overview of XoT Framework

The XoT framework comprises three critical components: the planning module, the reasoning module, and the verification module. Each of these modules plays a specific role in iteratively solving problems by adapting various strategies:

  1. Planning Module: This module is designed to select the most appropriate problem-solving method depending on the characteristics of the input question. The paper identifies multiple reasoning approaches, including Chain-of-Thought (CoT), Program-of-Thought (PoT), and the newly introduced Equation-of-Thought (EoT). The module capitalizes on the unique strengths of each approach to enhance reasoning efficiency.
  2. Reasoning Module: Upon selection of a method by the planning module, the reasoning module generates a solution based on the chosen strategy. The CoT method crafts a step-by-step natural language rationale, PoT leverages Python programming to calculate precise answers, and EoT utilizes systems of linear equations to approach math problems directly.
  3. Verification Module: This component is responsible for ensuring the validity of the generated solution. It employs passive verification through external program execution and active verification by reconsidering the solution against the problem's conditions. Unsuccessful verifications prompt the system to attempt another reasoning path.

Experimental Results

Extensive evaluations across ten mathematical reasoning datasets validate the effectiveness of the XoT framework. The system showcases robust performance enhancements over single-method approaches and demonstrates superior handling of complex reasoning tasks typically found in datasets like GSM8K, SVAMP, and others. Notably, the integration of active verification significantly reduces false-positive rates, thus allowing more frequent method switching under problematic scenarios.

The framework also exhibits versatility in accommodating recent advances in single reasoning methods, indicating its potential as a generalizable tool across diverse problem domains, including logic reasoning.

Implications and Future Directions

The implications of integrating multiple reasoning methods within a unified framework like XoT are far-reaching. It suggests a pathway for enhancing LLM capabilities in domains where traditional single-path reasoning might falter, such as scientific reasoning, technical problem-solving, and educational technologies requiring step-by-step logical construction.

Moreover, the adaptability of the XoT system to logical reasoning tasks, as indicated by experimentations on datasets like FOLIO, provides fertile ground for future research. Exploring additional reasoning strategies and enhancing planning algorithms for dynamic method selection could further improve the efficacy and applicability of this framework.

Conclusion

The XoT framework offers a promising direction in AI by integrating diverse reasoning thoughts and enabling efficient navigational switching between them through sophisticated verification processes. This methodological advancement not only contributes to the field of mathematical reasoning but also sets a precedent for developing future AI systems that demand high adaptability and precision across a spectrum of problem-solving scenarios. As the framework continues to evolve, it may significantly influence the operational paradigms of LLMs within complex and varied cognitive tasks.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (44)
  1. Language models are few-shot learners. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual.
  2. Program of thoughts prompting: Disentangling computation from reasoning for numerical reasoning tasks. CoRR, abs/2211.12588.
  3. Teaching large language models to self-debug. CoRR, abs/2304.05128.
  4. Palm: Scaling language modeling with pathways. CoRR, abs/2204.02311.
  5. Training verifiers to solve math word problems. CoRR, abs/2110.14168.
  6. Edward A. Feigenbaum and Julian Feldman. 1963. Computers and thought.
  7. Complexity-based prompting for multi-step reasoning. CoRR, abs/2210.00720.
  8. PAL: program-aided language models. CoRR, abs/2211.10435.
  9. FOLIO: natural language reasoning with first-order logic. CoRR, abs/2209.00840.
  10. Solving math word problems by combining language models with symbolic solvers. CoRR, abs/2304.09102.
  11. Measuring mathematical problem solving with the MATH dataset. In Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1, NeurIPS Datasets and Benchmarks 2021, December 2021, virtual.
  12. Carl Hewitt. 1969. PLANNER: A language for proving theorems in robots. In Proceedings of the 1st International Joint Conference on Artificial Intelligence, Washington, DC, USA, May 7-9, 1969, pages 295–302. William Kaufmann.
  13. Learning to solve arithmetic word problems with verb categorization. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014, October 25-29, 2014, Doha, Qatar, A meeting of SIGDAT, a Special Interest Group of the ACL, pages 523–533. ACL.
  14. Mathprompter: Mathematical reasoning using large language models. CoRR, abs/2303.05398.
  15. Large language models are zero-shot reasoners. In NeurIPS.
  16. Parsing algebraic word problems into equations. Trans. Assoc. Comput. Linguistics, 3:585–597.
  17. MAWPS: A math word problem repository. In NAACL HLT 2016, The 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego California, USA, June 12-17, 2016, pages 1152–1157. The Association for Computational Linguistics.
  18. Coderl: Mastering code generation through pretrained models and deep reinforcement learning. In NeurIPS.
  19. Solving quantitative reasoning problems with language models. In NeurIPS.
  20. On the advance of making language models better reasoners. CoRR, abs/2206.02336.
  21. Program induction by rationale generation: Learning to solve and explain algebraic word problems. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, ACL 2017, Vancouver, Canada, July 30 - August 4, Volume 1: Long Papers, pages 158–167. Association for Computational Linguistics.
  22. Chameleon: Plug-and-play compositional reasoning with large language models. CoRR, abs/2304.09842.
  23. A survey of deep learning for mathematical reasoning. CoRR, abs/2212.10535.
  24. Self-refine: Iterative refinement with self-feedback. CoRR, abs/2303.17651.
  25. OpenAI. 2023. GPT-4 technical report. CoRR, abs/2303.08774.
  26. Logic-lm: Empowering large language models with symbolic solvers for faithful logical reasoning. In Findings of the Association for Computational Linguistics: EMNLP 2023, Singapore, December 6-10, 2023, pages 3806–3824. Association for Computational Linguistics.
  27. Are NLP models really able to solve simple math word problems? In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2021, Online, June 6-11, 2021, pages 2080–2094. Association for Computational Linguistics.
  28. REFINER: reasoning feedback on intermediate representations. CoRR, abs/2304.01904.
  29. The art of SOCRATIC QUESTIONING: zero-shot multimodal reasoning with recursive thinking and self-questioning. CoRR, abs/2305.14999.
  30. Reasoning with language model prompting: A survey. CoRR, abs/2212.09597.
  31. Subhro Roy and Dan Roth. 2015. Solving general arithmetic word problems. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, Lisbon, Portugal, September 17-21, 2015, pages 1743–1752. The Association for Computational Linguistics.
  32. Reasoning about quantities in natural language. Trans. Assoc. Comput. Linguistics, 3:1–13.
  33. Reflexion: an autonomous agent with dynamic memory and self-reflection. CoRR, abs/2303.11366.
  34. Artificial General Intelligence - 9th International Conference, AGI 2016, New York, NY, USA, July 16-19, 2016, Proceedings, volume 9782 of Lecture Notes in Computer Science. Springer.
  35. Llama: Open and efficient foundation language models. CoRR, abs/2302.13971.
  36. Llama 2: Open foundation and fine-tuned chat models. CoRR, abs/2307.09288.
  37. Danqing Wang and Lei Li. 2023. Learn from mistakes through cooperative interaction with study assistant. CoRR, abs/2305.13829.
  38. Self-consistency improves chain of thought reasoning in language models. CoRR, abs/2203.11171.
  39. Deep neural solver for math word problems. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, Copenhagen, Denmark, September 9-11, 2017, pages 845–854. Association for Computational Linguistics.
  40. Chain-of-thought prompting elicits reasoning in large language models. In NeurIPS.
  41. Tree of thoughts: Deliberate problem solving with large language models. CoRR, abs/2305.10601.
  42. Automatic model selection with large language models for reasoning. CoRR, abs/2305.14333.
  43. Progressive-hint prompting improves reasoning in large language models. CoRR, abs/2304.09797.
  44. Least-to-most prompting enables complex reasoning in large language models. CoRR, abs/2205.10625.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Tengxiao Liu (7 papers)
  2. Qipeng Guo (72 papers)
  3. Yuqing Yang (83 papers)
  4. Xiangkun Hu (19 papers)
  5. Yue Zhang (618 papers)
  6. Xipeng Qiu (257 papers)
  7. Zheng Zhang (486 papers)
Citations (26)
Github Logo Streamline Icon: https://streamlinehq.com
Youtube Logo Streamline Icon: https://streamlinehq.com