Probing Physical Reasoning with Counter-Commonsense Context
Abstract: In this study, we create a CConS (Counter-commonsense Contextual Size comparison) dataset to investigate how physical commonsense affects the contextualized size comparison task; the proposed dataset consists of both contexts that fit physical commonsense and those that do not. This dataset tests the ability of LLMs to predict the size relationship between objects under various contexts generated from our curated noun list and templates. We measure the ability of several masked LLMs and generative models. The results show that while LLMs can use prepositions such as in'' andinto'' in the provided context to infer size relationships, they fail to use verbs and thus make incorrect judgments led by their prior physical commonsense.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.