Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
60 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

"John is 50 years old, can his son be 65?" Evaluating NLP Models' Understanding of Feasibility (2210.07471v2)

Published 14 Oct 2022 in cs.CL

Abstract: In current NLP research, large-scale LLMs and their abilities are widely being discussed. Some recent works have also found notable failures of these models. Often these failure examples involve complex reasoning abilities. This work focuses on a simple commonsense ability, reasoning about when an action (or its effect) is feasible. To this end, we introduce FeasibilityQA, a question-answering dataset involving binary classification (BCQ) and multi-choice multi-correct questions (MCQ) that test understanding of feasibility. We show that even state-of-the-art models such as GPT-3, GPT-2, and T5 struggle to answer the feasibility questions correctly. Specifically, on MCQ and BCQ questions, GPT-3 achieves an accuracy of just (19%, 62%) and (25%, 64%) in zero-shot and few-shot settings, respectively. We also evaluate models by providing relevant knowledge statements required to answer the question. We find that the additional knowledge leads to a 7% gain in performance, but the overall performance still remains low. These results make one wonder how much commonsense knowledge about action feasibility is encoded in state-of-the-art models and how well they can reason about it.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Himanshu Gupta (54 papers)
  2. Neeraj Varshney (47 papers)
  3. Swaroop Mishra (60 papers)
  4. Kuntal Kumar Pal (13 papers)
  5. Saurabh Arjun Sawant (4 papers)
  6. Kevin Scaria (7 papers)
  7. Siddharth Goyal (13 papers)
  8. Chitta Baral (152 papers)
Citations (14)