TEXT2AFFORD: Probing Object Affordance Prediction abilities of Language Models solely from Text (2402.12881v2)

Published 20 Feb 2024 in cs.CL

Abstract: We investigate the knowledge of object affordances in pre-trained LLMs (LMs) and pre-trained Vision-LLMs (VLMs). A growing body of literature shows that PTLMs fail inconsistently and non-intuitively, demonstrating a lack of reasoning and grounding. To take a first step toward quantifying the effect of grounding (or lack thereof), we curate a novel and comprehensive dataset of object affordances -- Text2Afford, characterized by 15 affordance classes. Unlike affordance datasets collected in vision and language domains, we annotate in-the-wild sentences with objects and affordances. Experimental results reveal that PTLMs exhibit limited reasoning abilities when it comes to uncommon object affordances. We also observe that pre-trained VLMs do not necessarily capture object affordances effectively. Through few-shot fine-tuning, we demonstrate improvement in affordance knowledge in PTLMs and VLMs. Our research contributes a novel dataset for language grounding tasks, and presents insights into LM capabilities, advancing the understanding of object affordances. Codes and data are available at https://github.com/sayantan11995/Affordance

References (41)

Summary

We haven't generated a summary for this paper yet.

Summarize Now

TEXT2AFFORD: Probing Object Affordance Prediction abilities of Language Models solely from Text (2402.12881v2)

Summary

Related Papers