Structure-Augmented Text Representation Learning for Efficient Knowledge Graph Completion (2004.14781v2)

Published 30 Apr 2020 in cs.CL

Abstract: Human-curated knowledge graphs provide critical supportive information to various natural language processing tasks, but these graphs are usually incomplete, urging auto-completion of them. Prevalent graph embedding approaches, e.g., TransE, learn structured knowledge via representing graph elements into dense embeddings and capturing their triple-level relationship with spatial distance. However, they are hardly generalizable to the elements never visited in training and are intrinsically vulnerable to graph incompleteness. In contrast, textual encoding approaches, e.g., KG-BERT, resort to graph triple's text and triple-level contextualized representations. They are generalizable enough and robust to the incompleteness, especially when coupled with pre-trained encoders. But two major drawbacks limit the performance: (1) high overheads due to the costly scoring of all possible triples in inference, and (2) a lack of structured knowledge in the textual encoder. In this paper, we follow the textual encoding paradigm and aim to alleviate its drawbacks by augmenting it with graph embedding techniques -- a complementary hybrid of both paradigms. Specifically, we partition each triple into two asymmetric parts as in translation-based graph embedding approach, and encode both parts into contextualized representations by a Siamese-style textual encoder. Built upon the representations, our model employs both deterministic classifier and spatial measurement for representation and structure learning respectively. Moreover, we develop a self-adaptive ensemble scheme to further improve the performance by incorporating triple scores from an existing graph embedding model. In experiments, we achieve state-of-the-art performance on three benchmarks and a zero-shot dataset for link prediction, with highlights of inference costs reduced by 1-2 orders of magnitude compared to a textual encoding method.

PDF Abstract

Structure-Augmented Text Representation Learning for Efficient Knowledge Graph Completion

This research paper proposes a novel approach to the task of knowledge graph completion (KGC) by leveraging a hybrid model that combines aspects of both textual encoding and graph embedding paradigms. The objective of the paper is to improve the generalization and robustness of knowledge graph completion models, which are often susceptible to errors due to the incomplete nature of human-curated knowledge graphs and the inherent limitations of existing graph embedding techniques.

The prevalent methods for KGC can broadly be divided into graph embedding approaches and textual encoding approaches. Graph embedding methods, such as TransE and RotatE, typically involve representing entities and relations as dense vectors and learning their triple-level interactions via spatial distance measures. Though these methods have been successful in capturing structured knowledge, they suffer from limitations in generalization to unseen entities and vulnerability to graph incompleteness. Conversely, textual encoding models, like KG-BERT, encode textual descriptions of graph triples with pre-trained LLMs, thereby enhancing the generalization and robustness of the models. However, these models face challenges due to high computational costs and insufficient incorporation of structured knowledge.

The authors introduce the Structure-Augmented Text Representation (StAR) model, a hybrid approach that aims to address the aforementioned challenges by combining elements from both paradigms. The innovation of this model lies in utilizing a Siamese-style textual encoder to partition triples into asymmetric parts, which are then encoded into context-rich, contextualized representations. These representations are leveraged in two learning objectives: a deterministic classification strategy and a spatial structure learning method. The former focuses on learning effective representations for entities and relations using a binary neural classifier, while the latter improves structured knowledge learning by modeling spatial characteristics between contextualized embeddings.

Key findings of the paper reveal that StAR achieves state-of-the-art performance across several benchmark datasets, with noticeable reductions in inference costs compared to existing textual encoding models like KG-BERT. Specifically, the paper highlights improvements in metrics such as Hits@1, Hits@10, and mean rank (MR), demonstrating the effectiveness of the model in various KGC tasks. Furthermore, the self-adaptive ensemble strategy proposed by the authors, which merges outputs from StAR and a graph embedding model, significantly enhances performance, particularly on datasets with unseen entities or relations.

Practically, this model offers a promising solution to the efficiency and generalization problems faced by existing KGC approaches, enabling quicker and more reliable predictions on large-scale graphs. Theoretically, the integration of text and structure learning paves the way for future developments in AI, where hybrid models could be more widely applicable in various domains requiring a harmonious synthesis of structured knowledge and rich textual contexts. The paper suggests that by further refining this hybrid approach, future models could not only improve on existing NLP tasks but also tackle a broader variety of knowledge-intensive tasks with greater flexibility and precision.

PDF Markdown Bookmark Chat (Pro)

Authors (5)

Bo Wang (823 papers)
Tao Shen (87 papers)
Guodong Long (115 papers)
Tianyi Zhou (172 papers)
Yi Chang (150 papers)

Citations (5)

View on Semantic Scholar

Related Papers

Find Related Papers

GitHub

GitHub - wangbo9719/StAR_KGC (83 stars)