Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

mLUKE: The Power of Entity Representations in Multilingual Pretrained Language Models (2110.08151v3)

Published 15 Oct 2021 in cs.CL

Abstract: Recent studies have shown that multilingual pretrained LLMs can be effectively improved with cross-lingual alignment information from Wikipedia entities. However, existing methods only exploit entity information in pretraining and do not explicitly use entities in downstream tasks. In this study, we explore the effectiveness of leveraging entity representations for downstream cross-lingual tasks. We train a multilingual LLM with 24 languages with entity representations and show the model consistently outperforms word-based pretrained models in various cross-lingual transfer tasks. We also analyze the model and the key insight is that incorporating entity representations into the input allows us to extract more language-agnostic features. We also evaluate the model with a multilingual cloze prompt task with the mLAMA dataset. We show that entity-based prompt elicits correct factual knowledge more likely than using only word representations. Our source code and pretrained models are available at https://github.com/studio-ousia/luke.

Citations (29)

Summary

  • The paper introduces mLUKE, a multilingual model that combines masked language modeling with masked entity prediction to enhance cross-lingual performance.
  • It employs entity linking and [MASK] token strategies to integrate token and entity contexts for improved information extraction.
  • Experimental results demonstrate that mLUKE outperforms word-based models, achieving higher accuracy on QA and superior F1 scores on RE and NER.

mLUKE: The Power of Entity Representations in Multilingual Pretrained LLMs

The paper, "mLUKE: The Power of Entity Representations in Multilingual Pretrained LLMs," introduces a novel approach to enhancing multilingual pretrained LLMs by incorporating entity representations. The primary focus of this paper is on the effectiveness of entity representations for cross-lingual tasks, leveraging information from Wikipedia entities to improve LLMs' performance.

Methodology

The core contribution of this paper is the development of Multilingual LUKE (mLUKE), an extension of LUKE designed for multilingual applications. The authors propose training the model using both masked LLMing (MLM) and masked entity prediction (MEP) tasks. mLUKE integrates tokenized text and associated entities simultaneously, transforming both into contextualized representations using a shared bidirectional transformer encoder.

The researchers explore two primary applications of entity representations:

  1. Entity Linking for Input Text: This approach involves identifying entities in input texts and attaching them to the input sequence, serving as language-independent features.
  2. Entity [MASK] Token: The [MASK] token for entities is utilized as a feature extractor, pertinent to tasks such as span classification, relation extraction, and named entity recognition.

Experimental Results

The experimental evaluation encompasses various cross-lingual tasks, including question answering (QA), relation extraction (RE), and named entity recognition (NER). The authors provide a comprehensive comparison between models utilizing word-based representations and those incorporating explicit entity representations.

  • Question Answering: Across datasets like XQuAD and MLQA, mLUKE demonstrated consistent outperformance over word-based counterparts, reinforcing the utility of entity representations. Notably, mLUKE showed superior accuracy in multilingual zero-shot cloze prompt tasks, reducing language bias and enhancing factual knowledge retrieval.
  • Relation Extraction and NER: In RE and NER tasks, mLUKE maintained its edge by capturing more language-neutral features, evident in improved F1 scores compared to traditional approaches.

Analysis and Implications

Through quantitative evaluations such as contextualized word retrieval, the paper posits that mLUKE's entity representations afford more aligned cross-lingual sentence-level features. Moreover, the mLUKE models validated the hypothesis that utilizing pre-trained entity embeddings is crucial for extracting meaningful, language-agnostic features.

This advancement holds practical implications, especially in enhancing AI systems for applications involving multilingual texts. Future research directions might include optimizing memory efficiency while maintaining robust entity representation, or exploring further pretraining strategies involving hierarchical entity embeddings.

Conclusion

The paper successfully substantiates the significance of entity representations in bolstering multilingual LLMs. The findings suggest a substantial potential for leveraging such representations to enhance language-related AI functionalities across varied linguistic contexts, marking a promising trajectory for future exploration in entity-aware multilingual systems.

Github Logo Streamline Icon: https://streamlinehq.com