Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
51 tokens/sec
GPT-4o
60 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

LogicAsker: Evaluating and Improving the Logical Reasoning Ability of Large Language Models (2401.00757v3)

Published 1 Jan 2024 in cs.SE, cs.AI, cs.CL, and cs.LO

Abstract: We introduce LogicAsker, a novel approach for evaluating and enhancing the logical reasoning capabilities of LLMs such as ChatGPT and GPT-4. Despite LLMs' prowess in tasks like writing assistance, code generation, and machine translation, assessing their ability to reason has been challenging. Traditional evaluations often prioritize accuracy on downstream tasks over direct assessments of reasoning processes. LogicAsker addresses this gap by employing a set of atomic reasoning skills grounded in propositional and predicate logic to systematically examine and improve the reasoning prowess of LLMs. Our methodology reveals significant gaps in LLMs' learning of logical rules, with identified reasoning failures ranging from 29\% to 90\% across different models. Moreover, we leverage these findings to construct targeted demonstration examples and fine-tune data, notably enhancing logical reasoning in models like GPT-4o by up to 5\%. To our knowledge, this is the first effort to utilize test case outcomes to effectively refine LLMs' formal reasoning capabilities. We make our code, data, and results publicly available (https://github.com/yxwan123/LogicAsker) to facilitate further research and replication of our findings.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Yuxuan Wan (28 papers)
  2. Wenxuan Wang (128 papers)
  3. Yiliu Yang (1 paper)
  4. Youliang Yuan (18 papers)
  5. Jen-tse Huang (46 papers)
  6. Pinjia He (47 papers)
  7. Wenxiang Jiao (44 papers)
  8. Michael R. Lyu (176 papers)
Citations (7)