A Unified Framework for Rank-based Evaluation Metrics for Link Prediction in Knowledge Graphs

Published 14 Mar 2022 in cs.LG and cs.AI | (2203.07544v2)

Abstract: The link prediction task on knowledge graphs without explicit negative triples in the training data motivates the usage of rank-based metrics. Here, we review existing rank-based metrics and propose desiderata for improved metrics to address lack of interpretability and comparability of existing metrics to datasets of different sizes and properties. We introduce a simple theoretical framework for rank-based metrics upon which we investigate two avenues for improvements to existing metrics via alternative aggregation functions and concepts from probability theory. We finally propose several new rank-based metrics that are more easily interpreted and compared accompanied by a demonstration of their usage in a benchmarking of knowledge graph embedding models.

Abstract PDF Upgrade to Chat

Authors (5)

Citations (17)

View on Semantic Scholar

Summary

The paper introduces a unified framework that refines rank-based evaluation metrics, addressing the challenge of missing negative samples in KG link prediction.
It proposes novel aggregation functions like harmonic and geometric mean ranks and applies probabilistic adjustments via expectation normalization, index transformations, and z-scores.
The framework overcomes dataset size dependencies, enhancing consistency and comparability in evaluations across diverse applications, including biomedical research.

A Unified Framework for Rank-Based Evaluation Metrics for Link Prediction in Knowledge Graphs

The paper "A Unified Framework for Rank-based Evaluation Metrics for Link Prediction in Knowledge Graphs" presented by Charles Tapley Hoyt et al. addresses the significant challenge of evaluating link prediction on Knowledge Graphs (KGs) in the absence of explicit negative samples. The paper critiques the current rank-based metrics and proposes a refined theoretical and practical framework for evaluation that improves interpretability and comparability across datasets of varying sizes and properties.

Knowledge Graphs are structured formalisms representing facts about entities and their interrelations, typically constructed under the open-world assumption (OWA), which presumes possible incompleteness of the graph without assuming missing triples are false. The task of link prediction on KGs, thus, becomes a binary classification problem where the positive unlabeled scenario necessitates rank-based evaluation metrics rather than traditional metrics like accuracy or F1-score.

Existing Metrics and Their Limitations

The paper begins by identifying the ubiquitous use of Mean Rank (MR), Mean Reciprocal Rank (MRR), and Hits at K (HK) as primary metrics for KG embeddings, noting their lack of a comprehensive theoretical foundation and the inability to compare results across datasets. Furthermore, these metrics are critiqued for their size dependence and the absence of a consistent basis for direct comparison, which limits their utility in applications like drug repositioning and target identification within the biomedical domain.

Proposed Developments: New Metrics and Adjustments

The authors propose a new theoretical framework for rank-based metrics and suggest several enhancements:

Alternative Aggregation Functions: By utilizing the generalized Hölder mean, the authors introduce novel metrics such as the Harmonic Mean Rank (HMR) and Geometric Mean Rank (GMR), which provide different biases towards high and low ranks. These metrics offer a more balanced view of model performance.
Probabilistic Adjustments: Inspired by previous adjustments to MR leading to Adjusted Mean Rank (AMR) and Adjusted Mean Rank Index (AMRI), the paper generalizes adjustments to all discussed metrics. This includes:
- Expectation Adjustments: Normalizing metrics by their expected values.
- Index Adjustments: Transforming metrics into indices where positive values indicate relative improvement over random performance.
- Z-Score Adjustments: Utilizing the central limit theorem to apply z-scores, allowing the comparison of results normalized against their expected variation.

Implications and Future Research Directions

The proposed framework significantly enhances the interpretability and comparability of link prediction evaluations on KGs. It solves size-dependence issues inherent in existing metrics, particularly when comparing models across different datasets. By applying the new metrics in case studies, the authors demonstrate improved consistency in evaluation outcomes.

Future work could further refine these metrics and explore their impact on various KG-related tasks, extending beyond link prediction to areas such as entity alignment and query embedding. Moreover, addressing applicability issues in hyper-relational KGs and considering real-world implications for practical applications could be potential research directions.

Overall, this work advances the state of evaluation methodologies in graph learning, emphasizing the necessity for metrics that truly reflect the capabilities of KG embedding methods. The provision of these enhanced metrics through the PyKEEN package allows for their widespread adoption and continued research in this domain.

Markdown Report Issue