Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Testing the General Deductive Reasoning Capacity of Large Language Models Using OOD Examples (2305.15269v3)

Published 24 May 2023 in cs.CL and cs.AI

Abstract: Given the intractably large size of the space of proofs, any model that is capable of general deductive reasoning must generalize to proofs of greater complexity. Recent studies have shown that LLMs possess some abstract deductive reasoning ability given chain-of-thought prompts. However, they have primarily been tested on proofs using modus ponens or of a specific size, and from the same distribution as the in-context examples. To measure the general deductive reasoning ability of LLMs, we test on a broad set of deduction rules and measure their ability to generalize to more complex proofs from simpler demonstrations from multiple angles: depth-, width-, and compositional generalization. To facilitate systematic exploration, we construct a new synthetic and programmable reasoning dataset that enables control over deduction rules and proof complexity. Our experiments on four LLMs of various sizes and training objectives show that they are able to generalize to compositional proofs. However, they have difficulty generalizing to longer proofs, and they require explicit demonstrations to produce hypothetical subproofs, specifically in proof by cases and proof by contradiction.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Abulhair Saparov (17 papers)
  2. Richard Yuanzhe Pang (26 papers)
  3. Vishakh Padmakumar (22 papers)
  4. Nitish Joshi (13 papers)
  5. Seyed Mehran Kazemi (17 papers)
  6. Najoung Kim (28 papers)
  7. He He (71 papers)
Citations (63)
X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets