Language Models can be Logical Solvers (2311.06158v1)

Published 10 Nov 2023 in cs.CL and cs.AI

Abstract: Logical reasoning is a fundamental aspect of human intelligence and a key component of tasks like problem-solving and decision-making. Recent advancements have enabled LLMs to potentially exhibit reasoning capabilities, but complex logical reasoning remains a challenge. The state-of-the-art, solver-augmented LLMs, use LLMs to parse natural language logical questions into symbolic representations first and then adopt external logical solvers to take in the symbolic representations and output the answers. Despite their impressive performance, any parsing errors will inevitably result in the failure of the execution of the external logical solver and no answer to the logical questions. In this paper, we introduce LoGiPT, a novel LLM that directly emulates the reasoning processes of logical solvers and bypasses the parsing errors by learning to strict adherence to solver syntax and grammar. LoGiPT is fine-tuned on a newly constructed instruction-tuning dataset derived from revealing and refining the invisible reasoning process of deductive solvers. Experimental results on two public deductive reasoning datasets demonstrate that LoGiPT outperforms state-of-the-art solver-augmented LMs and few-shot prompting methods on competitive LLMs like ChatGPT or GPT-4.

PDF Abstract

LLMs as Logical Solvers: An Analysis of LoGiPT

The paper discusses the development and evaluation of LoGiPT, a novel LLM designed to address the limitations of existing solver-augmented LLMs in logical reasoning tasks. Logical reasoning is an essential cognitive function, crucial in domains requiring problem-solving and decision-making, yet challenging for conventional LLMs due to the inherent complexity and precision required in parsing and reasoning over logical forms.

Background and Motivation

Traditional solver-augmented LLMs typically function by converting natural language (NL) logical statements into symbolic representations, which are then processed by external logical solvers to derive truth values. While this approach ensures logical rigor during the reasoning phase, it is inherently fragile; any errors in parsing the NL input into symbolic logic can lead to complete failures in obtaining valid outputs. Experimental observations underscore this weakness, with models such as Vicuna-13B achieving only a 17% success rate in parsing tasks on datasets like ProofWriter.

LoGiPT: Design and Methodology

LoGiPT is introduced to overcome the parsing error limitations. Distinctly, this model is fine-tuned to internalize and emulate the deductive reasoning processes of logical solvers directly within the LLM architecture, bypassing the need for external solvers and mitigating parsing error risks.

The development of LoGiPT involved several steps:

Data Construction: A new instruction-tuning dataset was developed by revealing and refining internal reasoning processes of logical solvers like Pyke, focusing on Prolog as a symbolic language representative. This dataset is used to emulate solver steps in a structured format.
Model Training: Open-source LLMs such as Vicuna and CodeLlama were fine-tuned using the constructed dataset. Fine-tuning was designed to equip these models with the ability to replicate solver-like reasoning, thereby directly deducing answers from NL inputs without symbolic translation.
Reasoning Process: LoGiPT employs a four-turn conversational design that sequences from presenting logical context, deducing implications symbolically, querying specific statements, and resolving truth values in a structured format simulating the stepwise logical progression of traditional solvers.

Experimental Evaluation

LoGiPT was tested against state-of-the-art solver-augmented models and traditional LLMs, including closed-source models like GPT-3.5 and GPT-4. Experiments on datasets such as ProofWriter and PrOntoQA reveal that LoGiPT significantly outperforms both logic-pipeline approaches and standard few-shot prompting methods in these settings. Notably, LoGiPT (CodeLlama-13b-hf) achieved an accuracy of 89.5% on ProofWriter, surpassing LogicLM (a solver-augmented approach using GPT-4) by 9.84 percentage points. This implies a strong argument for internalizing logical processes within LMs, enhancing their reliability and performance in reasoning tasks.

Implications and Future Work

LoGiPT represents a promising direction in model design, potentially transforming LLMs into robust solvers capable of nuanced reasoning directly over natural language constructs. This approach not only offers improved accuracy and reliability but also simplifies the model architecture by removing dependency on separate symbolic solvers.

Future developments could explore the adaptability of LoGiPT across a broader range of logical reasoning tasks and application domains. Additionally, further enhancement in emulating more intricate logical paradigms or integrating hybrid symbolic-neural techniques might offer even greater generalization capabilities and reasoning depth.

In conclusion, LoGiPT exemplifies an innovative stride towards marrying the linguistic finesse of LLMs with strict logical deduction capabilities, setting a new benchmark for reasoning tasks within AI research.