Do Neural Language Representations Learn Physical Commonsense? (1908.02899v1)

Published 8 Aug 2019 in cs.CL

Abstract: Humans understand language based on the rich background knowledge about how the physical world works, which in turn allows us to reason about the physical world through language. In addition to the properties of objects (e.g., boats require fuel) and their affordances, i.e., the actions that are applicable to them (e.g., boats can be driven), we can also reason about if-then inferences between what properties of objects imply the kind of actions that are applicable to them (e.g., that if we can drive something then it likely requires fuel). In this paper, we investigate the extent to which state-of-the-art neural language representations, trained on a vast amount of natural language text, demonstrate physical commonsense reasoning. While recent advancements of neural LLMs have demonstrated strong performance on various types of natural language inference tasks, our study based on a dataset of over 200k newly collected annotations suggests that neural language representations still only learn associations that are explicitly written down.

Citations (102)

View on Semantic Scholar

Summary

The paper assesses if neural language models learn physical commonsense, finding they are good at predicting object-affordance compatibility but struggle with inferring implicit affordance-property relationships.
The study evaluated models like BERT and ELMo on predicting compatibility using novel abstract and situated datasets annotated with object properties and affordances.
Results suggest models have limitations in implicit physical reasoning, implying that integrating external knowledge sources or enhancing model architecture may be necessary for deeper understanding.

Neural Language Representations and Physical Commonsense

The paper "Do Neural Language Representations Learn Physical Commonsense?" investigates the degree to which neural LLMs can infer physical commonsense knowledge from natural language text. A central premise of this research is the capacity of LLMs, like ELMo and BERT, to capture underlying physical commonsense through extensive exposure to text data, which implicitly encodes interactions among objects, their properties, and affordances.

Study Overview

The authors focus on assessing whether state-of-the-art neural models effectively understand the interactions between objects, their attributes, and the actions applied to them. They use two novel datasets: an abstract dataset that involves annotation of objects independent of context, and a situated dataset where objects are evaluated in real-world image contexts. Each dataset includes annotations on objects' properties and affordances, granting researchers the ability to test models' abilities to predict object-property (O $\longleftrightarrow$ P), object-affordance (O $\longleftrightarrow$ A), and affordance-property (A $\longleftrightarrow$ P) compatibility.

Experimentation and Results

Several neural LLMs are scrutinized, including static word embeddings like GloVe and Dependency-Based Embeddings and more recent contextualized models like ELMo and BERT. These models are evaluated using an MLP architecture, optimized for compatibility tasks between objects, properties, and affordances.

Results demonstrate robust performance in predicting the affordances associated with objects (O $\longleftrightarrow$ A), with models performing comparably to humans. The high accuracy observed here is attributed to the explicit linguistic expression of verb-object interactions in text corpora. However, challenges arise in the ability of models to infer affordance-property relationships (A $\longleftrightarrow$ P), with results falling significantly short compared to human reasoning. This indicates a deficiency in models' ability to infer implicit connections necessary for physical reasoning, which are inherently more subtle and less frequently articulated in natural language.

Analysis and Implications

Further analysis illustrates that certain properties—especially functional ones—exhibit better predictive performance, likely due to their direct association with affordances in linguistic data. Perceptual properties, conversely, show weaker correlations, highlighting models' limitations in abstract perceptual reasoning.

The paper suggests that simply increasing the volume of training data may not suffice to extract inherent physical commonsense knowledge. Instead, it posits that advancements in model architecture and the incorporation of more robust inductive biases are essential for models to better simulate human-like physical commonsense reasoning.

Future Directions and Theoretical Considerations

Embodied cognition theories are evoked to explain the gap between human and machine reasoning capabilities. The authors suggest exploring physics engines and simulations to enhance neural models' ability to infer physical interactions. Bridging formal representations from simulations and linguistic pretraining could potentially endow models with deeper understanding required for nuanced commonsense reasoning.

In conclusion, while neural LLMs have made significant strides, they remain challenged by tasks requiring implicit physical commonsense reasoning. Future research must focus on designing novel computational paradigms integrating external knowledge sources to facilitate deeper understanding and reasoning about the physical world.

PDF Markdown

Related Papers

YouTube

Show All Videos