Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Tiered Reasoning for Intuitive Physics: Toward Verifiable Commonsense Language Understanding (2109.04947v3)

Published 10 Sep 2021 in cs.CL

Abstract: Large-scale, pre-trained LLMs (LMs) have achieved human-level performance on a breadth of language understanding tasks. However, evaluations only based on end task performance shed little light on machines' true ability in language understanding and reasoning. In this paper, we highlight the importance of evaluating the underlying reasoning process in addition to end performance. Toward this goal, we introduce Tiered Reasoning for Intuitive Physics (TRIP), a novel commonsense reasoning dataset with dense annotations that enable multi-tiered evaluation of machines' reasoning process. Our empirical results show that while large LMs can achieve high end performance, they struggle to support their predictions with valid supporting evidence. The TRIP dataset and our baseline results will motivate verifiable evaluation of commonsense reasoning and facilitate future research toward developing better language understanding and reasoning models.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Shane Storks (14 papers)
  2. Qiaozi Gao (20 papers)
  3. Yichi Zhang (184 papers)
  4. Joyce Chai (52 papers)
Citations (20)

Summary

We haven't generated a summary for this paper yet.