Jointly Learning Entity and Relation Representations for Entity Alignment
(1909.09317v1)
Published 20 Sep 2019 in cs.CL
Abstract: Entity alignment is a viable means for integrating heterogeneous knowledge among different knowledge graphs (KGs). Recent developments in the field often take an embedding-based approach to model the structural information of KGs so that entity alignment can be easily performed in the embedding space. However, most existing works do not explicitly utilize useful relation representations to assist in entity alignment, which, as we will show in the paper, is a simple yet effective way for improving entity alignment. This paper presents a novel joint learning framework for entity alignment. At the core of our approach is a Graph Convolutional Network (GCN) based framework for learning both entity and relation representations. Rather than relying on pre-aligned relation seeds to learn relation representations, we first approximate them using entity embeddings learned by the GCN. We then incorporate the relation approximation into entities to iteratively learn better representations for both. Experiments performed on three real-world cross-lingual datasets show that our approach substantially outperforms state-of-the-art entity alignment methods.
The paper proposes a joint learning framework that integrates entity and relation representations using HGCN to improve alignment across heterogeneous knowledge graphs.
The model leverages layered highway gates and iterative refinement to optimize embeddings, resulting in superior performance on cross-lingual datasets.
Experimental results demonstrate that even imperfect entity name initialization and relation context aggregation significantly boost both entity and relation alignment.
This paper, "Jointly Learning Entity and Relation Representations for Entity Alignment" (Wu et al., 2019), addresses the challenge of entity alignment across heterogeneous knowledge graphs (KGs). Entity alignment is crucial for integrating information from different KGs that describe the same real-world entities using varying naming conventions or structures. Existing embedding-based methods, while effective, often primarily focus on entity embeddings and neglect explicit utilization of relation representations or rely heavily on pre-aligned relation seeds, which can be costly to obtain. Graph Convolutional Networks (GCNs) have shown promise for embedding KGs but typically do not directly model relations, unlike translation-based methods which require relation embeddings but struggle to explicitly leverage relation information for entity alignment.
The authors propose a novel joint learning framework that simultaneously learns high-quality representations for both entities and relations to improve alignment performance. The core idea is to leverage the complementary information between entities and their connecting relations. The framework operates in three stages:
Preliminary Entity Alignment: Initially, the two KGs are treated as a single graph. A Highway-GCN (HGCN) model is used to learn entity embeddings in a unified vector space. HGCN incorporates highway gates within GCN layers to regulate information flow, aiming to mitigate noise propagation and improve representation learning, especially for higher-degree neighborhoods. Entity alignment is performed by calculating distance between entity embeddings using the L1 norm. Training uses a margin-based loss function with challenging negative sampling where negative pairs are sampled from the K-nearest entities in the embedding space. Initial entity features are derived from machine-translated entity names, which is shown to provide valuable initial signals despite potential translation inaccuracies.
Approximating Relation Representations: Since GCNs don't explicitly model relations, relation representations are approximated based on the entity embeddings learned in the first stage. For a given relation, its representation is computed by aggregating (e.g., averaging and concatenating) the HGCN embeddings of its head and tail entities. A learnable linear transformation is applied to this aggregated vector. Relation alignment can then be performed by measuring the distance between these approximated relation representations, incorporating an additional term that rewards relations with more aligned head/tail entities based on the initial entity alignments. This stage operates in an unsupervised manner for relation alignment itself, not requiring pre-aligned relations for training.
Joint Entity and Relation Alignment: This is the core iterative refinement stage. After the preliminary entity alignment model has converged, the approximated relation representations are integrated into the entity embeddings. For each entity, a relation context vector is created by aggregating the representations of its adjacent relations. This relation context vector is then combined (e.g., concatenated) with the entity's HGCN embedding to form a new "joint entity representation". The model is then further trained using the original seed entity alignments, but now using the joint entity representations to compute the loss. This process allows entity embeddings and relation approximations (which are derived from entity embeddings) to iteratively improve each other, leading to better representations for both and consequently better alignment results for both entities and relations.
The model is evaluated on three real-world cross-lingual datasets (DBP15K ZH-EN, JA-EN, FR-EN). The experiments demonstrate that the proposed HGCN-JE (Joint Entity) and HGCN-JR (Joint Relation) models significantly outperform state-of-the-art baseline methods like BootEA and GCN (which only focuses on entities) on both entity and relation alignment tasks.
Key findings from the experiments include:
Entity name initialization, even with imperfect machine translations, substantially boosts performance compared to random initialization.
Layer-wise highway gates in HGCN effectively improve entity alignment by better controlling information flow.
Jointly learning entity and relation representations by incorporating relation context into entity embeddings (HGCN-JE and GCN-JE variants show improvements over their preliminary HGCN-PE and GCN-PE counterparts) benefits both entity and relation alignment.
The method for approximating relation representations based on head/tail entity embeddings is effective, enabling good relation alignment performance without seed relation pairs.
The approach is robust and performs well even with a smaller proportion of seed entity alignments compared to baselines.
The paper concludes that the joint learning framework, by effectively integrating relation information into entity representations and allowing iterative refinement of both, yields superior and more robust entity and relation alignments using only a small set of pre-aligned entities for training.