Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Enhancing Zero-Shot Chain-of-Thought Reasoning in Large Language Models through Logic (2309.13339v4)

Published 23 Sep 2023 in cs.CL, cs.AI, cs.LG, and cs.SC

Abstract: Recent advancements in LLMs have showcased their remarkable generalizability across various domains. However, their reasoning abilities still have significant room for improvement, especially when confronted with scenarios requiring multi-step reasoning. Although LLMs possess extensive knowledge, their reasoning often fails to effectively utilize this knowledge to establish a coherent thinking paradigm. These models sometimes show hallucinations as their reasoning procedures are unconstrained by logical principles. Aiming at improving the zero-shot chain-of-thought reasoning ability of LLMs, we propose LoT (Logical Thoughts), a self-improvement prompting framework that leverages principles rooted in symbolic logic, particularly Reductio ad Absurdum, to systematically verify and rectify the reasoning processes step by step. Experimental evaluations conducted on language tasks in diverse domains, including arithmetic, commonsense, symbolic, causal inference, and social problems, demonstrate the efficacy of enhanced reasoning by logic. The implementation code for LoT can be accessed at: https://github.com/xf-zhao/LoT.

Enhancing Zero-Shot Chain-of-Thought Reasoning in LLMs through Logic

The research presented focuses on the challenge of improving the reasoning capabilities of LLMs by leveraging logical principles, particularly in the context of zero-shot Chain-of-Thought (CoT) reasoning. Despite their substantial success across diverse applications, LLMs frequently encounter issues such as hallucinations and untrustworthy deductions due to their inability to systematically employ logical principles during multi-step reasoning processes. This paper introduces the Logical Thoughts (LoT) prompting framework, which aims to enhance reasoning capabilities by systematically verifying and rectifying reasoning steps using principles rooted in symbolic logic, notably Reductio ad Absurdum.

The researchers identified a gap in the reasoning processes of LLMs: although these models possess extensive knowledge, they often fail to apply it systematically within coherent reasoning paradigms. This leads to logical errors and hallucinations, where LLMs might confidently assert incorrect statements. The LoT framework aims to address these issues by introducing a self-improvement mechanism that encourages models to think, verify, and revise reasoning steps iteratively.

Methodology and Implementation

The LoT framework is constructed on the principle of Reductio ad Absurdum. This logical principle aids in establishing the validity of statements by assuming their negation and demonstrating a contradiction. Applying this, the authors propose a structured prompting strategy where an LLM is guided to generate and examine different perspectives of reasoning: the original reasoning step, its negation, and supporting explanations for both. The model then adopts the more plausible reasoning path based on these examinations.

LoT was implemented and evaluated on a variety of language tasks spanning arithmetic, commonsense reasoning, causal inference, and social reasoning, demonstrating the framework's versatility and effectiveness. The evaluation involved comparing the enhanced zero-shot CoT capabilities of models using LoT against baseline CoT models across several datasets and model sizes, including Vicuna and GPT iterations.

Results and Discussion

The results highlighted notable performance improvements when LoT was applied. Models using LoT generally outperformed the standard zero-shot CoT models across multiple datasets. For instance, the LoT approach showed increased accuracy in datasets like GSM8K, AQuA, and tasks involving complex commonsense reasoning. The performance gains were more consistent with larger model sizes, suggesting the enhanced capacity of these models to integrate and adapt logical corrections within their reasoning processes.

The findings also revealed that LoT prompts models to engage in deeper reasoning paths, correcting missteps and revising their conclusions more effectively than standard CoT. This indicates that the integration of logical verification steps leads to more reliable and accurate reasoning outputs. However, some limitations were observed, such as occasional revisions not aligning with ground truth answers due to the inherent bias in LLM generations.

Implications for AI and Future Research

The introduction of LoT provides a structured pathway to systematically improve the reasoning capabilities of LLMs by embedding logical verification into their decision-making processes. This has significant implications for the deployment of AI systems in applications demanding high accuracy and reliability, such as autonomous decision-making and expert systems in critical fields.

Future research could explore extending the LoT methodology into few-shot learning contexts or other NLU tasks that benefit from structured reasoning. Additionally, enhancing the efficiency of the verification process and exploring integration with neurosymbolic approaches could further improve the robustness of LLMs. Moreover, leveraging LoT in reinforcement learning frameworks like RLAIF could yield advances in aligning AI behaviors with human-like reasoning patterns, offering a pathway toward more autonomous and adaptive AI systems.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (29)
  1. David Agler. 2012. Symbolic Logic: Syntax, Semantics, and Proof. Rowman & Littlefield Publishers, Lanham, Md.
  2. Michael Ahn et al. 2022. Do As I Can, Not As I Say: Grounding Language in Robotic Affordances.
  3. A multitask, multilingual, multimodal evaluation of chatgpt on reasoning, hallucination, and interactivity. arXiv preprint arXiv:2302.04023.
  4. Sparks of Artificial General Intelligence: Early experiments with GPT-4.
  5. Training Verifiers to Solve Math Word Problems.
  6. Antonia Creswell and Murray Shanahan. 2022. Faithful Reasoning Using Large Language Models.
  7. Selection-Inference: Exploiting Large Language Models for Interpretable Logical Reasoning.
  8. Artur d’Avila Garcez and Luis C. Lamb. 2020. Neurosymbolic AI: The 3rd Wave. arXiv:2012.05876 [cs].
  9. Maieutic Prompting: Logically Consistent Reasoning with Recursive Explanations. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 1266–1279, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
  10. Large Language Models are Zero-Shot Reasoners. In Advances in Neural Information Processing Systems.
  11. Making Large Language Models Better Reasoners with Step-Aware Verifier.
  12. Let’s Verify Step by Step.
  13. Program Induction by Rationale Generation: Learning to Solve and Explain Algebraic Word Problems. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 158–167, Vancouver, Canada. Association for Computational Linguistics.
  14. Deductive Verification of Chain-of-Thought Reasoning.
  15. Self-Refine: Iterative Refinement with Self-Feedback.
  16. Between words and characters: A Brief History of Open-Vocabulary Modeling and Tokenization in NLP.
  17. Improving Coherence and Consistency in Neural Sequence Models with Dual-System, Neuro-Symbolic Reasoning.
  18. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35:27730–27744.
  19. Neuro-symbolic artificial intelligence. AI Communications, 34(3):197–209.
  20. Toolformer: Language Models Can Teach Themselves to Use Tools.
  21. Aarohi Srivastava et al. 2023. Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models.
  22. Self-Consistency Improves Chain of Thought Reasoning in Language Models.
  23. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. In Advances in Neural Information Processing Systems.
  24. Generating Sequences by Learning to Self-Correct.
  25. Tree of Thoughts: Deliberate Problem Solving with Large Language Models.
  26. ReAct: Synergizing Reasoning and Acting in Language Models. In The Eleventh International Conference on Learning Representations.
  27. Chat with the Environment: Interactive Multimodal Perception using Large Language Models.
  28. Progressive-Hint Prompting Improves Reasoning in Large Language Models.
  29. Least-to-Most Prompting Enables Complex Reasoning in Large Language Models.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Xufeng Zhao (14 papers)
  2. Mengdi Li (19 papers)
  3. Wenhao Lu (17 papers)
  4. Cornelius Weber (51 papers)
  5. Jae Hee Lee (24 papers)
  6. Kun Chu (5 papers)
  7. Stefan Wermter (157 papers)
Citations (19)
Youtube Logo Streamline Icon: https://streamlinehq.com