Matching the Blanks: Distributional Similarity for Relation Learning (1906.03158v1)

Published 7 Jun 2019 in cs.CL and cs.AI

Abstract: General purpose relation extractors, which can model arbitrary relations, are a core aspiration in information extraction. Efforts have been made to build general purpose extractors that represent relations with their surface forms, or which jointly embed surface forms with relations from an existing knowledge graph. However, both of these approaches are limited in their ability to generalize. In this paper, we build on extensions of Harris' distributional hypothesis to relations, as well as recent advances in learning text representations (specifically, BERT), to build task agnostic relation representations solely from entity-linked text. We show that these representations significantly outperform previous work on exemplar based relation extraction (FewRel) even without using any of that task's training data. We also show that models initialized with our task agnostic representations, and then tuned on supervised relation extraction datasets, significantly outperform the previous methods on SemEval 2010 Task 8, KBP37, and TACRED.

PDF Abstract

Matching the Blanks: Distributional Similarity for Relation Learning

The paper "Matching the Blanks: Distributional Similarity for Relation Learning" by Livio Baldini Soares, Nicholas FitzGerald, Jeffrey Ling, and Tom Kwiatkowski addresses the significant challenge of developing general-purpose relation extractors in information extraction (IE). The authors build on Harris' distributional hypothesis and recent advances in text representation learning (specifically BERT) to propose a task-agnostic method for generating relation representations derived solely from entity-linked text. This approach aims to overcome the generalization limitations present in previous methodologies.

Methodology

The authors delineate different strategies for representing relations by utilizing the Transformer architecture, particularly BERT. The research introduces a novel training objective called "matching the blanks," a distantly supervised approach where entities in sentences are masked out. The task involves determining if different masked contexts refer to the same entities, leveraging a variant of the cloze-style test.

The primary contributions of this paper can be summarized as follows:

Relation Representation Learning: The authors propose several architectures built on BERT to evaluate how well these models can encode relations between entity pairs. They use varying input types (standard, positional embeddings, and entity markers) and different methods for extracting fixed-length relation representations (cls token, mention pooling, and entity start state). Their extensive experimentation shows that the combination of entity markers as input and the entity start state for output representation outperforms other methods significantly.
Matching the Blanks Objective: To train relation representations without labeled data, the "matching the blanks" task is introduced. The paper hypothesizes that relation statements matching the same entities, even when entities are masked, should yield similar relation representations. This unsupervised approach uses entity-linked text to avoid the reliance on predefined ontologies or large amounts of labeled data.

Experimental Results

The authors rigorously test their proposed methods against established benchmarks and tasks:

FewRel: For exemplar-based relation extraction, the proposed model (BERT + MTB) significantly outperforms existing state-of-the-art methods on the FewRel dataset, even when the model is not fine-tuned using FewRel's training data. The task-agnostic model demonstrates substantial improvements—8.8% in the 5-way-1-shot task and 12.7% in the 10-way-1-shot setting—showcasing the efficacy of the matching the blanks training.
Supervised Tasks: The supervised versions of the model show state-of-the-art performance on SemEval 2010 Task 8, KBP37, and TACRED datasets. Additional fine-tuning after the "matching the blanks" pre-training further enhances performance, with notable improvements in low-resource settings where only a fraction of the training data is used.

Implications and Future Work

This research has profound implications for the field of information extraction. The ability to create accurate relation extractors with minimal labeled data could significantly reduce the manual labor involved in constructing such systems. Practically, this method can be utilized in numerous domains where relation extraction is crucial, including knowledge base population and natural language understanding.

Future developments could explore leveraging the learned embeddings for more complex tasks such as clustering relation statements into relation types or contributing to distributed representations of knowledge graphs. The integration of these techniques into real-world IE systems could enhance their flexibility and scalability.

This paper contributes significantly to advancing the state of the art in relation extraction by introducing a novel, efficient, and effective methodology for unsupervised relation learning, paving the way for more general-purpose and robust information extraction systems.

PDF Markdown Bookmark Chat (Pro)

Authors (4)

Livio Baldini Soares (18 papers)
Nicholas FitzGerald (15 papers)
Jeffrey Ling (7 papers)
Tom Kwiatkowski (21 papers)

Citations (731)

View on Semantic Scholar

Matching the Blanks: Distributional Similarity for Relation Learning (1906.03158v1)