What do you learn from context? Probing for sentence structure in contextualized word representations (1905.06316v1)

Published 15 May 2019 in cs.CL

Abstract: Contextualized representation models such as ELMo (Peters et al., 2018a) and BERT (Devlin et al., 2018) have recently achieved state-of-the-art results on a diverse array of downstream NLP tasks. Building on recent token-level probing work, we introduce a novel edge probing task design and construct a broad suite of sub-sentence tasks derived from the traditional structured NLP pipeline. We probe word-level contextual representations from four recent models and investigate how they encode sentence structure across a range of syntactic, semantic, local, and long-range phenomena. We find that existing models trained on LLMing and translation produce strong representations for syntactic phenomena, but only offer comparably small improvements on semantic tasks over a non-contextual baseline.

PDF Abstract

Contextualized Word Representations: Probing for Sentence Structure

The paper "What do you learn from context? Probing for sentence structure in contextualized word representations" by Ian Tenney et al. takes a critical step forward in understanding the capabilities and limitations of contextualized word representation models. Contextualized embeddings such as ELMo and BERT have notably advanced performance across a variety of NLP tasks. However, a precise understanding of the types of linguistic information encoded by these models is essential. This paper addresses this by devising a novel "edge probing" methodology to scrutinize sentence structure encoded in word representations.

Analytical Framework

The researchers introduce an edge probing task suite, designed to probe a broad range of syntactic and semantic phenomena encoded in word representations. This is an extension of token-level probing work, organized to cover various traditional structured NLP tasks such as part-of-speech tagging, constituent labeling, dependency labeling, named entity recognition, semantic role labeling, coreference resolution, and relation classification.

Experimental Design and Methodology

Probing Model

The probing model architecture keeps the parameters of the contextual word representations fixed and uses a projection layer followed by a self-attentive span pooling operator. This is designed to ensure that the model cannot learn beyond the information encoded in the embeddings. The objective is to predict properties from given span pairs purely based on their contextual embeddings.

Representation Models

Four prominent models of contextual word representations—the CoVe, ELMo, OpenAI GPT, and BERT—are probed in this paper. Each of these models has been pretrained on different corpora using different methodologies, making it possible to contrast not just their performance but also the underlying factors driving their representations.

Results and Findings

The analysis reveals several interesting aspects of how these models encode syntactic versus semantic information:

Syntactic Encoding Dominance: Major gains from contextual representations are seen in syntactic tasks such as dependency labeling and constituent parsing, compared to semantic tasks like coreference resolution. This suggests a stronger encapsulation of syntactic structures.
Local vs. Long-Range Dependencies: Enhanced performance in tasks involving short-range dependencies was observed when utilizing simple CNN layers. However, the full ELMo model still outperformed these, indicating that pretrained contextualized embeddings do encode useful long-range linguistic information.
Comparative Performance: Among the models, BERT, particularly the BERT-large, significantly outperforms others in tasks such as OntoNotes coreference, achieving a more than 40% error reduction compared to ELMo. This indicates deeper and more complex interactions in its learned representations.

Implications and Speculations

The implications of these findings are multi-faceted. Practically, syntactically informed word representations can be crucial for applications needing precise syntactic understanding, such as language parsing and syntactic error correction. The limitations observed in semantic encoding suggest that while contextual embeddings are adept at capturing structural properties, further enhancement may be needed in areas requiring deep semantic understanding.

Future Directions

Future work should focus on:

Model Adaptation: Exploring fine-tuning strategies to enhance semantic information encoding.
Broader Task Suites: Expanding probing tasks to encompass more diverse and complicated linguistic phenomena.
Hybrid Models: Combining syntactic and semantic-focused pretraining objectives to achieve more balanced representations.

This paper offers a precise mechanism and comprehensive suite for probing the inner workings and efficacy of contextualized word representations, forming a foundational benchmark for both researchers and practitioners in NLP. The proposed methodologies and findings pave the way for advancing our understanding and improving the design of future NLP models.

PDF Markdown Bookmark Chat (Pro)

Authors (11)

Ian Tenney (21 papers)
Patrick Xia (26 papers)
Berlin Chen (53 papers)
Alex Wang (32 papers)
Adam Poliak (17 papers)
Najoung Kim (28 papers)
Benjamin Van Durme (173 papers)
Samuel R. Bowman (103 papers)
Dipanjan Das (42 papers)
Ellie Pavlick (66 papers)
R Thomas McCoy (1 paper)

Citations (816)

View on Semantic Scholar

Related Papers

Find Related Papers

YouTube

Show All Videos