Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
60 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

GraphReason: Enhancing Reasoning Capabilities of Large Language Models through A Graph-Based Verification Approach (2308.09267v4)

Published 18 Aug 2023 in cs.AI

Abstract: LLMs have showcased impressive reasoning capabilities, particularly when guided by specifically designed prompts in complex reasoning tasks such as math word problems. These models typically solve tasks using a chain-of-thought approach, which not only bolsters their reasoning abilities but also provides valuable insights into their problem-solving process. However, there is still significant room for enhancing the reasoning abilities of LLMs. Some studies suggest that the integration of an LLM output verifier can boost reasoning accuracy without necessitating additional model training. In this paper, we follow these studies and introduce a novel graph-based method to further augment the reasoning capabilities of LLMs. We posit that multiple solutions to a reasoning task, generated by an LLM, can be represented as a reasoning graph due to the logical connections between intermediate steps from different reasoning paths. Therefore, we propose the Reasoning Graph Verifier (GraphReason) to analyze and verify the solutions generated by LLMs. By evaluating these graphs, models can yield more accurate and reliable results.Our experimental results show that our graph-based verification method not only significantly enhances the reasoning abilities of LLMs but also outperforms existing verifier methods in terms of improving these models' reasoning performance.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (32)
  1. Abien Fred Agarap. 2019. Deep learning using rectified linear units (relu). Preprint, arXiv:1803.08375.
  2. Palm: Scaling language modeling with pathways. Preprint, arXiv:2204.02311.
  3. Training verifiers to solve math word problems. arXiv preprint arXiv:2110.14168.
  4. Training verifiers to solve math word problems. Preprint, arXiv:2110.14168.
  5. Bert: Pre-training of deep bidirectional transformers for language understanding. Preprint, arXiv:1810.04805.
  6. Did Aristotle Use a Laptop? A Question Answering Benchmark with Implicit Reasoning Strategies. Transactions of the Association for Computational Linguistics (TACL).
  7. Google. 2023. Palm 2 technical report. Preprint, arXiv:2305.10403.
  8. Jie Huang and Kevin Chen-Chuan Chang. 2023. Towards reasoning in large language models: A survey. Preprint, arXiv:2212.10403.
  9. Large language models are zero-shot reasoners. Preprint, arXiv:2205.11916.
  10. Can language models learn from explanations in context? In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 537–563, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
  11. Can language models learn from explanations in context? arXiv preprint arXiv:2204.02329.
  12. Making language models better reasoners with step-aware verifier. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 5315–5333, Toronto, Canada. Association for Computational Linguistics.
  13. Seeking patterns, not just memorizing procedures: Contrastive learning for solving math word problems. In Findings of the Association for Computational Linguistics: ACL 2022, pages 2486–2496, Dublin, Ireland. Association for Computational Linguistics.
  14. Ilya Loshchilov and Frank Hutter. 2019. Decoupled weight decay regularization. Preprint, arXiv:1711.05101.
  15. A diverse corpus for evaluating and developing english math word problem solvers. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 975–984.
  16. OpenAI. 2023. Gpt-4 technical report. Preprint, arXiv:2303.08774.
  17. Are NLP models really able to solve simple math word problems? In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 2080–2094, Online. Association for Computational Linguistics.
  18. Reasoning like program executors. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 761–779, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
  19. Scaling language models: Methods, analysis & insights from training gopher. Preprint, arXiv:2112.11446.
  20. CLUTRR: A diagnostic benchmark for inductive reasoning from text. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 4506–4515, Hong Kong, China. Association for Computational Linguistics.
  21. CommonsenseQA: A question answering challenge targeting commonsense knowledge. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4149–4158, Minneapolis, Minnesota. Association for Computational Linguistics.
  22. Large language models still can’t plan (a benchmark for llms on planning and reasoning about change). Preprint, arXiv:2206.10498.
  23. Logic-driven context extension and data augmentation for logical reasoning of text. In Findings of the Association for Computational Linguistics: ACL 2022, pages 1619–1629, Dublin, Ireland. Association for Computational Linguistics.
  24. Self-consistency improves chain of thought reasoning in language models. Preprint, arXiv:2203.11171.
  25. Chain of thought prompting elicits reasoning in large language models. arXiv preprint arXiv:2201.11903.
  26. Zhipeng Xie and Shichao Sun. 2019. A goal-driven tree-structured neural model for math word problems. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI-19, pages 5299–5305. International Joint Conferences on Artificial Intelligence Organization.
  27. How powerful are graph neural networks? In International Conference on Learning Representations.
  28. Turning tables: Generating examples from semi-structured tables for endowing language models with reasoning skills. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 6016–6031, Dublin, Ireland. Association for Computational Linguistics.
  29. The gap of semantic parsing: A survey on automatic math word problem solvers. Preprint, arXiv:1808.07290.
  30. Graph-to-tree learning for solving math word problems. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 3928–3937, Online. Association for Computational Linguistics.
  31. A survey of large language models. Preprint, arXiv:2303.18223.
  32. Least-to-most prompting enables complex reasoning in large language models. Preprint, arXiv:2205.10625.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (1)
  1. Lang Cao (19 papers)
Citations (7)