- The paper introduces a self-supervised method to perform entity alignment without relying on manually labeled pairs.
- It leverages a Relative Similarity Metric with self-negative sampling and multiple negative queues to optimize embedding space separation.
- Experiments demonstrate that SelfKG attains competitive performance on both monolingual and multilingual benchmarks.
The paper "SelfKG: Self-Supervised Entity Alignment in Knowledge Graphs" (2203.01044) addresses the fundamental problem of entity alignment (EA) across different knowledge graphs (KGs). Existing state-of-the-art EA methods largely rely on large amounts of manually labeled aligned entity pairs for supervision. Acquiring these labels is expensive and time-consuming, especially for large-scale, real-world KGs, which hinders the practical application of these methods. The paper explores the possibility of performing entity alignment without any label supervision, using a self-supervised approach.
The core idea of SelfKG is based on the theoretical finding that entity alignment can benefit significantly from pushing unlabeled negative pairs far apart rather than solely relying on pulling labeled positive pairs closer. This concept is formalized as the Relative Similarity Metric (RSM). In the absence of ground truth labels, directly optimizing the "alignment" term of the Noise Contrastive Estimation (NCE) loss (which aims to pull positive pairs together) is not possible. SelfKG focuses on optimizing the "uniformity" term, which encourages representations of non-aligned entities to be far from each other. The theoretical analysis shows that optimizing this RSM objective indirectly pulls implicitly aligned entities closer in the embedding space.
To implement this self-supervised learning objective effectively, SelfKG introduces two key strategies:
- Self-Negative Sampling: Standard supervised EA methods sample negative entities from the target KG for an entity in the source KG, excluding the known aligned entity. Without labels, sampling from the target KG risks picking a true positive as a negative (a "collision"), which can disrupt training, especially with many negative samples. SelfKG proposes sampling negative entities from the same KG as the anchor entity (e.g., sampling negatives from Gx for an entity in Gx). Theoretical analysis shows this self-negative sampling strategy is still effective for aligning entities across KGs and is robust to potential duplicated entities within the same KG. The joint optimization objective considers pushing away negatives sampled from Gx for entities in Gy, and negatives sampled from Gy for entities in Gx, as shown in Equation 2:
L=LRSM∣λ,x(f;τ,M,px)+LRSM∣λ,y(f;τ,M,py).
- Multiple Negative Queues: Large numbers of negative samples are crucial for the effectiveness of contrastive learning objectives like RSM. However, encoding massive negative samples in each training iteration is computationally expensive. SelfKG adapts the Momentum Contrast (MoCo) framework by maintaining two negative queues, one for each KG (Gx and Gy). These queues store embeddings of previously processed entity batches from the respective KGs, providing a large pool of negative samples with limited computational overhead. A momentum update strategy is used for the target encoder that generates embeddings for the negative queues to ensure training stability.
SelfKG leverages existing techniques for initial entity embedding and neighborhood aggregation. It uses pre-trained multilingual LLMs, specifically LaBSE, to embed entities from different KGs into a shared vector space (uni-space learning). A single-layer Graph Attention Network (GAT) is used to aggregate information from one-hop neighbors, refining the initial entity embeddings. Ablation studies show that while initial high-quality embeddings from models like LaBSE are beneficial, the SelfKG training process provides significant further improvement.
The paper evaluates SelfKG on two benchmark datasets: DWY100K (monolingual) and DBP15K (multilingual, Chinese-English, Japanese-English, French-English). Experiments demonstrate that SelfKG, using 0% of training labels, achieves performance comparable to or surpassing many state-of-the-art supervised methods that use 100% of the training labels. On DWY100Kdbp_yg, SelfKG achieves a Hit@1 of 1.000, matching the performance of supervised state-of-the-art. On DBP15K, SelfKG's performance is slightly lower than the top supervised methods (like BERT-INT), indicating that multilingual alignment without supervision remains a more challenging task.
Ablation studies highlight the importance of the proposed components:
- Removing the RSM-based objective significantly reduces performance.
- Removing neighborhood aggregation also leads to a performance drop.
- Replacing self-negative sampling with traditional cross-KG sampling without labels (risking collision) also reduces performance, confirming the effectiveness of the self-negative sampling strategy.
- Using lower-quality initial embeddings (FastText instead of LaBSE) results in lower overall performance, but SelfKG training still provides substantial gains over the initial embeddings.
- Using multi-hop neighbors or explicit relation information was found to be less beneficial or even harmful in the self-supervised setting compared to 1-hop structural information.
Hyperparameter studies show that increasing the number of negative samples (via queue size and batch size) improves performance, consistent with the theoretical analysis. The momentum coefficient in the MoCo mechanism is also crucial for training stability and performance.
In a low-data resource setting comparison, SelfKG (0% labels) performs significantly better than a supervised version of SelfKG trained with a small percentage of labels (e.g., less than 25%).
The practical implications are significant: SelfKG provides a viable approach to perform high-quality entity alignment without the need for expensive manual annotation of aligned entity pairs, making it more applicable to real-world scenarios with large and evolving KGs.
Limitations identified include the reliance on good initial entity embeddings and a performance gap with some highly supervised methods, particularly in challenging multilingual settings.
The code and data for SelfKG are publicly available at \url{https://github.com/THUDM/SelfKG}, facilitating reproducibility and further research.
In summary, SelfKG makes a notable step towards self-supervised entity alignment by redesigning the learning objective based on relative similarity, introducing a robust self-negative sampling strategy, and efficiently scaling negative sampling using multiple momentum-updated queues. The results demonstrate that high-quality entity alignment is achievable without traditional label supervision.