Graph Convolution over Pruned Dependency Trees Improves Relation Extraction (1809.10185v1)

Published 26 Sep 2018 in cs.CL

Abstract: Dependency trees help relation extraction models capture long-range relations between words. However, existing dependency-based models either neglect crucial information (e.g., negation) by pruning the dependency trees too aggressively, or are computationally inefficient because it is difficult to parallelize over different tree structures. We propose an extension of graph convolutional networks that is tailored for relation extraction, which pools information over arbitrary dependency structures efficiently in parallel. To incorporate relevant information while maximally removing irrelevant content, we further apply a novel pruning strategy to the input trees by keeping words immediately around the shortest path between the two entities among which a relation might hold. The resulting model achieves state-of-the-art performance on the large-scale TACRED dataset, outperforming existing sequence and dependency-based neural models. We also show through detailed analysis that this model has complementary strengths to sequence models, and combining them further improves the state of the art.

View on arXiv

Authors (3)

Yuhao Zhang (107 papers)
Peng Qi (56 papers)
Christopher D. Manning (169 papers)

Citations (702)

View on Semantic Scholar

Summary

Graph Convolution over Pruned Dependency Trees Improves Relation Extraction

The paper "Graph Convolution over Pruned Dependency Trees Improves Relation Extraction" by Yuhao Zhang, Peng Qi, and Christopher D. Manning examines the application of graph convolutional networks (GCNs) to the task of relation extraction (RE). The authors propose a novel model that leverages syntactic dependency trees to efficiently capture and process long-range word dependencies, ultimately yielding state-of-the-art results on benchmark datasets.

Overview

Relation extraction involves determining whether a specific relation exists between two identified entities within a sentence. Dependency trees provide a valuable mechanism for encapsulating syntactic dependencies that traditional sequence-based models may overlook. However, previous dependency-based models struggled either by pruning too aggressively, thus losing crucial contextual information, or by incurring excessive computational costs due to difficulty in parallelizing across tree structures.

The primary contributions of this paper are twofold:

GCNs Tailored for Relation Extraction: The authors extend GCNs to handle arbitrary dependency structures in a manner that supports efficient parallel processing.
Path-Centric Pruning Strategy: A novel pruning technique that retains relevant contextual words close to the shortest path between entities, thus balancing information retention with noise reduction.

Model Architecture

Graph Convolutional Networks over Dependency Trees

The core innovation lies in adapting GCNs to operate over dependency trees. Each tree is converted into an adjacency matrix, which the GCN uses to aggregate and propagate information through graph nodes representing tokens. The GCN layers are designed to accommodate self-loops and normalized activations to mitigate issues related to node degree variability, ensuring consistent node representation magnitudes.

Encoding Relations with GCN

The network employs multiple GCN layers to derive hidden representations for each token. These representations are combined through a max-pooling operation to form a comprehensive sentence-level embedding. Additionally, the model extracts entity-specific embeddings by pooling information over tokens representing the subject and object entities. The final embedding, used for relation classification, is a concatenation of the sentence-level and entity-level representations fed into a feedforward neural network.

Contextualized GCN

To address potential parsing errors and incorporate word-order information, the authors introduce a variant termed Contextualized GCN (C-GCN). This variant preprocesses input token vectors via a bi-directional LSTM, enhancing the contextual robustness of the GCN layers.

Path-Centric Pruning

The pruning strategy is a critical enhancement, retaining tokens within a distance $K$ from the shortest dependency path between entities. Empirically, $K=1$ provides an optimal balance, ensuring relevant off-path information is preserved while extraneous content is discarded.

Experimental Results

The models were evaluated on two datasets: TACRED and SemEval 2010 Task 8. Significant improvements were observed:

TACRED: The C-GCN achieved state-of-the-art performance with an F1 score of 66.4, surpassing previous dependency-based and sequence-based models. Combining C-GCN with PA-LSTM through simple prediction interpolation further improved the F1 score to 68.2.
SemEval: The C-GCN outperformed existing models, demonstrating robustness under both with-mention and mask-mention evaluation approaches.

Analysis and Implications

Through detailed ablation studies, the paper highlights the significance of each model component. The entity representations and feedforward layers contribute substantially to performance, reinforcing the importance of properly encoding syntactic dependencies.

A comparative analysis revealed that GCN-based models are particularly effective at handling long-range dependencies in sentences, while sequence models like PA-LSTM excel at leveraging local word patterns, even under suboptimal parsing conditions. This complementary strength underscores the potential of hybrid modeling approaches.

Future Directions

The findings suggest several avenues for future research:

Exploring More Complex Pruning Strategies: Further refining pruning techniques could offer even better performance by dynamically adjusting the pruning distance based on syntactic importance.
Integration with Other Neural Models: Further integration of GCNs with advanced contextual LLMs could provide richer syntactic and semantic representations, potentially boosting performance on more diverse datasets.
Application Beyond Relation Extraction: The proposed GCN architecture could benefit other NLP tasks that rely heavily on syntactic structure, such as semantic role labeling or event extraction.

Overall, this paper makes significant strides in leveraging graph convolutional networks for relation extraction, presenting a robust, efficient approach that delivers state-of-the-art performance through innovative use of syntactic dependencies and thoughtful pruning strategies.

PDF Markdown

Related Papers

Find Related Papers