SelfKG: Self-Supervised Entity Alignment in Knowledge Graphs (2203.01044v1)

Published 2 Mar 2022 in cs.LG and cs.CL

Abstract: Entity alignment, aiming to identify equivalent entities across different knowledge graphs (KGs), is a fundamental problem for constructing Web-scale KGs. Over the course of its development, the label supervision has been considered necessary for accurate alignments. Inspired by the recent progress of self-supervised learning, we explore the extent to which we can get rid of supervision for entity alignment. Commonly, the label information (positive entity pairs) is used to supervise the process of pulling the aligned entities in each positive pair closer. However, our theoretical analysis suggests that the learning of entity alignment can actually benefit more from pushing unlabeled negative pairs far away from each other than pulling labeled positive pairs close. By leveraging this discovery, we develop the self-supervised learning objective for entity alignment. We present SelfKG with efficient strategies to optimize this objective for aligning entities without label supervision. Extensive experiments on benchmark datasets demonstrate that SelfKG without supervision can match or achieve comparable results with state-of-the-art supervised baselines. The performance of SelfKG suggests that self-supervised learning offers great potential for entity alignment in KGs. The code and data are available at https://github.com/THUDM/SelfKG.

Citations (60)

View on Semantic Scholar

Summary

The paper introduces a self-supervised method to perform entity alignment without relying on manually labeled pairs.
It leverages a Relative Similarity Metric with self-negative sampling and multiple negative queues to optimize embedding space separation.
Experiments demonstrate that SelfKG attains competitive performance on both monolingual and multilingual benchmarks.

The paper "SelfKG: Self-Supervised Entity Alignment in Knowledge Graphs" (2203.01044) addresses the fundamental problem of entity alignment (EA) across different knowledge graphs (KGs). Existing state-of-the-art EA methods largely rely on large amounts of manually labeled aligned entity pairs for supervision. Acquiring these labels is expensive and time-consuming, especially for large-scale, real-world KGs, which hinders the practical application of these methods. The paper explores the possibility of performing entity alignment without any label supervision, using a self-supervised approach.

The core idea of SelfKG is based on the theoretical finding that entity alignment can benefit significantly from pushing unlabeled negative pairs far apart rather than solely relying on pulling labeled positive pairs closer. This concept is formalized as the Relative Similarity Metric (RSM). In the absence of ground truth labels, directly optimizing the "alignment" term of the Noise Contrastive Estimation (NCE) loss (which aims to pull positive pairs together) is not possible. SelfKG focuses on optimizing the "uniformity" term, which encourages representations of non-aligned entities to be far from each other. The theoretical analysis shows that optimizing this RSM objective indirectly pulls implicitly aligned entities closer in the embedding space.

To implement this self-supervised learning objective effectively, SelfKG introduces two key strategies:

Self-Negative Sampling: Standard supervised EA methods sample negative entities from the target KG for an entity in the source KG, excluding the known aligned entity. Without labels, sampling from the target KG risks picking a true positive as a negative (a "collision"), which can disrupt training, especially with many negative samples. SelfKG proposes sampling negative entities from the same KG as the anchor entity (e.g., sampling negatives from $G_x$ for an entity in $G_x$ ). Theoretical analysis shows this self-negative sampling strategy is still effective for aligning entities across KGs and is robust to potential duplicated entities within the same KG. The joint optimization objective considers pushing away negatives sampled from $G_x$ for entities in $G_y$ , and negatives sampled from $G_y$ for entities in $G_x$ , as shown in Equation 2: $\mathcal{L}=\mathcal{L}_{\rm RSM|}\lambda,\mathsf{x}(f;\tau,M,p_{\mathsf x}) + \mathcal{L}_{\rm RSM|}\lambda,\mathsf{y}(f;\tau,M,p_{\mathsf y})$ .
Multiple Negative Queues: Large numbers of negative samples are crucial for the effectiveness of contrastive learning objectives like RSM. However, encoding massive negative samples in each training iteration is computationally expensive. SelfKG adapts the Momentum Contrast (MoCo) framework by maintaining two negative queues, one for each KG ( $G_x$ and $G_y$ ). These queues store embeddings of previously processed entity batches from the respective KGs, providing a large pool of negative samples with limited computational overhead. A momentum update strategy is used for the target encoder that generates embeddings for the negative queues to ensure training stability.

SelfKG leverages existing techniques for initial entity embedding and neighborhood aggregation. It uses pre-trained multilingual LLMs, specifically LaBSE, to embed entities from different KGs into a shared vector space (uni-space learning). A single-layer Graph Attention Network (GAT) is used to aggregate information from one-hop neighbors, refining the initial entity embeddings. Ablation studies show that while initial high-quality embeddings from models like LaBSE are beneficial, the SelfKG training process provides significant further improvement.

The paper evaluates SelfKG on two benchmark datasets: DWY100K (monolingual) and DBP15K (multilingual, Chinese-English, Japanese-English, French-English). Experiments demonstrate that SelfKG, using 0% of training labels, achieves performance comparable to or surpassing many state-of-the-art supervised methods that use 100% of the training labels. On DWY100K $_{\text{dbp\_yg}}$ , SelfKG achieves a Hit@1 of 1.000, matching the performance of supervised state-of-the-art. On DBP15K, SelfKG's performance is slightly lower than the top supervised methods (like BERT-INT), indicating that multilingual alignment without supervision remains a more challenging task.

Ablation studies highlight the importance of the proposed components:

Removing the RSM-based objective significantly reduces performance.
Removing neighborhood aggregation also leads to a performance drop.
Replacing self-negative sampling with traditional cross-KG sampling without labels (risking collision) also reduces performance, confirming the effectiveness of the self-negative sampling strategy.
Using lower-quality initial embeddings (FastText instead of LaBSE) results in lower overall performance, but SelfKG training still provides substantial gains over the initial embeddings.
Using multi-hop neighbors or explicit relation information was found to be less beneficial or even harmful in the self-supervised setting compared to 1-hop structural information.

Hyperparameter studies show that increasing the number of negative samples (via queue size and batch size) improves performance, consistent with the theoretical analysis. The momentum coefficient in the MoCo mechanism is also crucial for training stability and performance.

In a low-data resource setting comparison, SelfKG (0% labels) performs significantly better than a supervised version of SelfKG trained with a small percentage of labels (e.g., less than 25%).

The practical implications are significant: SelfKG provides a viable approach to perform high-quality entity alignment without the need for expensive manual annotation of aligned entity pairs, making it more applicable to real-world scenarios with large and evolving KGs.

Limitations identified include the reliance on good initial entity embeddings and a performance gap with some highly supervised methods, particularly in challenging multilingual settings.

The code and data for SelfKG are publicly available at \url{https://github.com/THUDM/SelfKG}, facilitating reproducibility and further research.

In summary, SelfKG makes a notable step towards self-supervised entity alignment by redesigning the learning objective based on relative similarity, introducing a robust self-negative sampling strategy, and efficiently scaling negative sampling using multiple momentum-updated queues. The results demonstrate that high-quality entity alignment is achievable without traditional label supervision.

PDF Markdown

Related Papers

GitHub

GitHub - THUDM/SelfKG: Codes for WWW2022 accepted paper: SelfKG: Self-Supervised Entity Alignment in Knowledge Graphs (66 stars)