Co-training Embeddings of Knowledge Graphs and Entity Descriptions for Cross-lingual Entity Alignment (1806.06478v1)

Published 18 Jun 2018 in cs.AI and cs.CL

Abstract: Multilingual knowledge graph (KG) embeddings provide latent semantic representations of entities and structured knowledge with cross-lingual inferences, which benefit various knowledge-driven cross-lingual NLP tasks. However, precisely learning such cross-lingual inferences is usually hindered by the low coverage of entity alignment in many KGs. Since many multilingual KGs also provide literal descriptions of entities, in this paper, we introduce an embedding-based approach which leverages a weakly aligned multilingual KG for semi-supervised cross-lingual learning using entity descriptions. Our approach performs co-training of two embedding models, i.e. a multilingual KG embedding model and a multilingual literal description embedding model. The models are trained on a large Wikipedia-based trilingual dataset where most entity alignment is unknown to training. Experimental results show that the performance of the proposed approach on the entity alignment task improves at each iteration of co-training, and eventually reaches a stage at which it significantly surpasses previous approaches. We also show that our approach has promising abilities for zero-shot entity alignment, and cross-lingual KG completion.

View on arXiv

Authors (5)

Muhao Chen (159 papers)
Yingtao Tian (32 papers)
Kai-Wei Chang (292 papers)
Steven Skiena (49 papers)
Carlo Zaniolo (20 papers)

Citations (233)

View on Semantic Scholar

Summary

Co-training Embeddings of Knowledge Graphs and Entity Descriptions for Cross-lingual Entity Alignment

In the domain of cross-lingual entity alignment within knowledge graphs (KGs), the joint embedding of multilingual knowledge structures and descriptive entity information presents an emergent topic of substantial research interest. This paper presents a novel approach leveraging co-training techniques to integrate bilingual KG embeddings with descriptions of entities, enhancing semi-supervised cross-lingual learning. This research addresses a critical gap insofar as existing models often struggle with low coverage of inter-lingual links (ILLs), particularly within multilingual KGs populated with sparse alignment data.

The paper introduces an innovative framework that performs co-training of two distinct but interdependent embedding models: a multilingual KG embedding model and a multilingual literal description embedding model. This framework is implemented and tested on a trilingual dataset derived from Wikipedia, where traditional large-scale ILL data are latent. The primary results show consistent, iterative improvements in the task of entity alignment, outperforming existing approaches in accuracy and applicability.

Technical Highlights and Strong Results

The framework leverages a translational KG embedding model, akin to variations of TransE, which captures the structural nuances of each language-specific KG in low-dimensional spaces and formalizes cross-lingual inferences through linear transformations. This model shows that by calibrating the entity vectors via such transformations, cross-lingual relationships can be naturally inferred and utilized.

Furthermore, the inclusion of a description model employs an attentive gated recurrent unit (AGRU) encoder, enhanced with multilingual word embeddings, to encapsulate the semantic essence of multilingual descriptions. The dual-model approach iteratively refines itself by exchanging high-confidence ILLs, gradually enriching the dataset for subsequent training iterations, which significantly boosts performance metrics such as hit rate and mean reciprocal rank (MRR).

Across experiments, the proposed model consistently achieves superior accuracy in entity alignment tasks when compared to state-of-the-art methodologies including multiple MTransE variants and self-training models like ITransE. Notably, the co-training method elevates results even in zero-shot alignment scenarios, demonstrating strong capability in mapping entities devoid of structural links but described textually across languages.

Theoretical and Practical Implications

This dual-model co-training strategy entails profound implications for both theory and practice in AI-centric applications which demand cross-lingual entity linking. Theoretically, it extends the utility of embedding models by introducing an effective mechanism to incorporate descriptive semantic layers, traditionally disregarded in KG alignment tasks. Practically, it establishes a scalable and more comprehensive paradigm for improving cross-lingual semantic interoperability in environments such as multilingual QA systems or federated data integration frameworks, where language inconsistencies traditionally pose significant barriers.

Future Directions

The paper leaves several avenues open for future exploration. One promising direction could involve the use of advanced non-translational models, like those adopting dot-products or convolutional architectures, in the KG embedding component to assess whether they further elevate performance when jointly co-trained with description embeddings. Additionally, exploring ensemble methods for cross-lingual prediction, perhaps via interpolative boosts using intermediary multilingual structures, could serve to further bolster KG completion tasks in underrepresented language datasets.

Through its methodological innovations and empirical validations, this paper contributes meaningfully to the field of multilingual knowledge representation and inference, suggesting pathways for sophisticated, scalable approaches to breaking language barriers in knowledge processing systems.

PDF Markdown

Related Papers

Find Related Papers