Knowledge Graph Embedding for Link Prediction: A Comparative Analysis (2002.00819v4)

Published 3 Feb 2020 in cs.LG, cs.DB, and stat.ML

Abstract: Knowledge Graphs (KGs) have found many applications in industry and academic settings, which in turn, have motivated considerable research efforts towards large-scale information extraction from a variety of sources. Despite such efforts, it is well known that even state-of-the-art KGs suffer from incompleteness. Link Prediction (LP), the task of predicting missing facts among entities already a KG, is a promising and widely studied task aimed at addressing KG incompleteness. Among the recent LP techniques, those based on KG embeddings have achieved very promising performances in some benchmarks. Despite the fast growing literature in the subject, insufficient attention has been paid to the effect of the various design choices in those methods. Moreover, the standard practice in this area is to report accuracy by aggregating over a large number of test facts in which some entities are over-represented; this allows LP methods to exhibit good performance by just attending to structural properties that include such entities, while ignoring the remaining majority of the KG. This analysis provides a comprehensive comparison of embedding-based LP methods, extending the dimensions of analysis beyond what is commonly available in the literature. We experimentally compare effectiveness and efficiency of 16 state-of-the-art methods, consider a rule-based baseline, and report detailed analysis over the most popular benchmarks in the literature.

Citations (314)

View on Semantic Scholar

Summary

The paper finds that tensor decomposition models, particularly ComplEx with N3 regularization, consistently achieve robust performance across varied benchmarks.
The paper compares tensor, geometric, and deep learning approaches, showing that geometric models use spatial transforms while deep models capture complex dependencies despite higher computational costs.
The paper highlights that dataset structure, including relational paths and peer relations, significantly influences link prediction accuracy and model effectiveness.

Knowledge Graph Embedding for Link Prediction: A Comparative Analysis

The paper "Knowledge Graph Embedding for Link Prediction: A Comparative Analysis" by Andrea Rossi et al. provides an in-depth examination of the various methods used for link prediction (LP) in Knowledge Graphs (KGs) through the lens of KG embeddings. This comparative paper is framed against the backdrop of the inherent incompleteness of state-of-the-art KGs, which motivates the need for effective link prediction techniques.

Overview of Link Prediction Techniques

The authors categorize LP methods into three main families which are significant in evaluating their performance: Tensor Decomposition Models, Geometric Models, and Deep Learning Models. Each category encompasses a range of models characterized by unique architectural choices and operational mechanics.

Tensor Decomposition Models: These models treat LP as a tensor completion problem, decomposing the multi-relational data into low-dimensional vector spaces. Notable models in this category include DistMult, ComplEx, Analogy, SimplE, and TuckER. This family of models often achieves competitive results due to its ability to efficiently model interactions within the multi-relational space using relatively lightweight parameter spaces.
Geometric Models: Models under this category, like TransE, CrossE, and RotatE, interpret relations as spatial transformations in the vector space. These approaches are compelling as they draw parallels with word embedding methods, attempting to capture relational patterns via geometric structures like translations and rotations.
Deep Learning Models: This family exploits neural networks to model the latent representation of KGs, including convolutional and recurrent architectures. Models such as ConvE, ConvKB, and RSN are particularly potent in learning complex dependencies through neural representations but at the cost of increased computational complexity.

Experimental Methodology and Results

The analysis spans 16 state-of-the-art models evaluated against five well-known datasets: FB15k, FB15k-237, WN18, WN18RR, and YAGO3-10. These datasets are selected for their widespread usage and represent comprehensive benchmarks in the LP task.

Efficiency: Measured in terms of training and prediction times, the paper highlights the variability in computational demands across different models and datasets. Tensor Decomposition Models generally exhibit lower training times, whereas Deep Learning Models are more computationally intensive due to their complex architectures.
Effectiveness: Various structural features of the training data are explored, such as the number of peers and relational path support, to understand their impact on predictions. A detailed performance analysis reveals that models like ComplEx with N3 regularization are consistently effective across different datasets, indicating robustness in handling diverse KG characteristics.

Implications and Future Directions

The paper underscores the nuanced interaction between model architecture, training data structure, and task performance. The inherent robustness of Tensor Decomposition Models hints at their potential to remain a mainstay in LP tasks, although newer roto-translational models like RotatE also show promise.

The authors identify key challenges, such as the handling of beyond-binary relations and the need for standardized evaluation policies. Addressing these could lead to improvements in LP performance across varied KG domains, reducing the semantic loss from data transformations like Star2Clique (S2C) used in dataset construction.

In conclusion, this comprehensive analysis offers not only a clear comparison of current methodologies but also insights into structural influences and potential future advancements in the field of KG embeddings for LP. The observations made provide a strong foundation for further research into expanding the capabilities and efficiency of link prediction models.

PDF Markdown