Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

RobustLR: Evaluating Robustness to Logical Perturbation in Deductive Reasoning (2205.12598v2)

Published 25 May 2022 in cs.CL, cs.LG, and cs.LO

Abstract: Transformers have been shown to be able to perform deductive reasoning on a logical rulebase containing rules and statements written in English natural language. While the progress is promising, it is currently unclear if these models indeed perform logical reasoning by understanding the underlying logical semantics in the language. To this end, we propose RobustLR, a suite of evaluation datasets that evaluate the robustness of these models to minimal logical edits in rulebases and some standard logical equivalence conditions. In our experiments with RoBERTa and T5, we find that the models trained in prior works do not perform consistently on the different perturbations in RobustLR, thus showing that the models are not robust to the proposed logical perturbations. Further, we find that the models find it especially hard to learn logical negation and disjunction operators. Overall, using our evaluation sets, we demonstrate some shortcomings of the deductive reasoning-based LLMs, which can eventually help towards designing better models for logical reasoning over natural language. All the datasets and code base have been made publicly available.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Soumya Sanyal (16 papers)
  2. Zeyi Liao (14 papers)
  3. Xiang Ren (194 papers)
Citations (18)

Summary

We haven't generated a summary for this paper yet.