Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

RICA: Evaluating Robust Inference Capabilities Based on Commonsense Axioms (2005.00782v4)

Published 2 May 2020 in cs.CL, cs.AI, and cs.LO

Abstract: Pre-trained LLMs (PTLMs) have achieved impressive performance on commonsense inference benchmarks, but their ability to employ commonsense to make robust inferences, which is crucial for effective communications with humans, is debated. In the pursuit of advancing fluid human-AI communication, we propose a new challenge, RICA: Robust Inference capability based on Commonsense Axioms, that evaluates robust commonsense inference despite textual perturbations. To generate data for this challenge, we develop a systematic and scalable procedure using commonsense knowledge bases and probe PTLMs across two different evaluation settings. Extensive experiments on our generated probe sets with more than 10k statements show that PTLMs perform no better than random guessing on the zero-shot setting, are heavily impacted by statistical biases, and are not robust to perturbation attacks. We also find that fine-tuning on similar statements offer limited gains, as PTLMs still fail to generalize to unseen inferences. Our new large-scale benchmark exposes a significant gap between PTLMs and human-level language understanding and offers a new challenge for PTLMs to demonstrate commonsense.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Pei Zhou (30 papers)
  2. Rahul Khanna (4 papers)
  3. Seyeon Lee (6 papers)
  4. Bill Yuchen Lin (72 papers)
  5. Daniel Ho (18 papers)
  6. Jay Pujara (44 papers)
  7. Xiang Ren (194 papers)
Citations (36)