Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

ReClor: A Reading Comprehension Dataset Requiring Logical Reasoning (2002.04326v3)

Published 11 Feb 2020 in cs.CL, cs.AI, and cs.LG

Abstract: Recent powerful pre-trained LLMs have achieved remarkable performance on most of the popular datasets for reading comprehension. It is time to introduce more challenging datasets to push the development of this field towards more comprehensive reasoning of text. In this paper, we introduce a new Reading Comprehension dataset requiring logical reasoning (ReClor) extracted from standardized graduate admission examinations. As earlier studies suggest, human-annotated datasets usually contain biases, which are often exploited by models to achieve high accuracy without truly understanding the text. In order to comprehensively evaluate the logical reasoning ability of models on ReClor, we propose to identify biased data points and separate them into EASY set while the rest as HARD set. Empirical results show that state-of-the-art models have an outstanding ability to capture biases contained in the dataset with high accuracy on EASY set. However, they struggle on HARD set with poor performance near that of random guess, indicating more research is needed to essentially enhance the logical reasoning ability of current models.

An Evaluation of Logical Reasoning in Machine Reading Comprehension: Insights from ReClor

The paper presents ReClor, a new dataset specifically designed to challenge existing models on machine reading comprehension (MRC) tasks by requiring logical reasoning—a critical, yet often underrepresented, cognitive ability. ReClor's provision is timely, given the near saturation of performance by state-of-the-art (SOTA) models like BERT, GPT-2, XLNet, and RoBERTa on traditional MRC datasets. Despite finalizing impressive metrics on existing datasets, these models lack rigorous assessment on logical reasoning, a cognitive ability crucial for comprehensive text understanding.

Dataset Overview

ReClor is derived from logical reasoning questions used in standardized graduate admission tests such as GMAT and LSAT. It encompasses 6,138 data points, purposefully selected to necessitate intricate logical reasoning. The dataset is unique in comparing both biased and unbiased data—denoted as EASY and HARD sets respectively—achieved by segregating questions that can be answered correctly simply through option bias exploitation versus those requiring genuine content comprehension.

Model Performance and Analysis

The empirical evaluation showcases that SOTA models, such as GPT-2 and RoBERTa, exhibit high performance on the EASY subset but struggle significantly on the HARD set, nearing random guess accuracy. Conversely, human performance remains consistently higher across both subsets. This bifurcated performance accentuates a detrimental reliance on dataset biases among SOTA models.

By employing fine-tuning techniques—with an example being pre-training on the RACE dataset followed by fine-tuning on ReClor—model performance improves, yet still lags behind human capabilities, particularly on tasks requiring deft logical reasoning.

Implications and Future Directions

The development of ReClor underscores a pivotal need for advancing models beyond lexical and syntactic manipulations to a more profound understanding involving logical reasoning. Models need to evolve from exploiting dataset biases to demonstrating competencies in various reasoning types—including assumption verification, implications, and resolving apparent inconsistencies in a text.

Practically, enhancing logical reasoning capabilities within NLP modules is anticipated to improve applications in industries where nuanced decision-making or text understanding is required, such as legal tech and automated critical analysis tools. Theoretically, further exploration into transfer learning strategies and novel architectures could yield significant advancements in logical reasoning.

In conclusion, while current models show commendable progress, the findings from ReClor reemphasize the need for ongoing research to equip AI models with true logical reasoning abilities. Scholars and practitioners should heed the insights provided by ReClor to foster developments that bridge the current gap between human and machine text comprehension.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Weihao Yu (36 papers)
  2. Zihang Jiang (28 papers)
  3. Yanfei Dong (13 papers)
  4. Jiashi Feng (295 papers)
Citations (214)