Global Entity Disambiguation with BERT (1909.00426v5)

Published 1 Sep 2019 in cs.CL and cs.LG

Abstract: We propose a global entity disambiguation (ED) model based on BERT. To capture global contextual information for ED, our model treats not only words but also entities as input tokens, and solves the task by sequentially resolving mentions to their referent entities and using resolved entities as inputs at each step. We train the model using a large entity-annotated corpus obtained from Wikipedia. We achieve new state-of-the-art results on five standard ED datasets: AIDA-CoNLL, MSNBC, AQUAINT, ACE2004, and WNED-WIKI. The source code and model checkpoint are available at https://github.com/studio-ousia/luke.

Citations (28)

View on Semantic Scholar

Summary

The paper presents a novel BERT-based model that jointly encodes words and entities to incorporate global contextual information for improved disambiguation.
It employs a large, entity-annotated Wikipedia corpus to predict masked entities and iteratively refine context-based decisions.
The approach achieves state-of-the-art results on five of six benchmark datasets, highlighting its effectiveness in handling complex ED tasks.

Global Entity Disambiguation with BERT

This paper presents an innovative approach to the task of Entity Disambiguation (ED) by leveraging BERT, a transformer-based architecture, to incorporate global contextual information. The proposed model notably transforms both words and entities into input tokens, enabling the disambiguation process through a sequential approach. Such a methodology captures both local and global contexts, enhancing decision-making coherence.

Methodology

The research introduces a model where entity and word tokens are processed collectively. Training is conducted using a large, entity-annotated corpus derived from Wikipedia. The training phase involves predicting masked entities, facilitating learning through both terms and non-masked entities. Inference is executed sequentially: the model first resolves one mention based on available context and previously resolved entities, ostensibly enriching the model's understanding with each step of the process.

Experimental Results

The model is evaluated on six prominent ED datasets: AIDA-CoNLL, MSNBC, AQUAINT, ACE2004, WNED-WIKI, and WNED-CWEB. Notably, the model achieves state-of-the-art performance across five of these six datasets, demonstrating the benefits of incorporating global contextual information.

The authors compared their model against several strong baselines, including other transformer-based ED models. The confidence-order variant, which selects the most confidently resolvable mention at each step, consistently achieved superior results, particularly on mentions requiring precise context for disambiguation.

Implications and Future Work

The incorporation of global context paradigms via BERT in ED tasks marks a significant shift from focusing solely on local context. This work implies further potential in refining ED models through contextual learning techniques. Future directions could explore handling entities not present in the original vocabulary, which remains a limitation. Enhancing model adaptability to accommodate such entities could enable broader applicability and robustness, especially in dynamic knowledge base settings.

Conclusion

The integration of BERT in ED tasks by treating both entities and words as contextual tokens represents a noteworthy methodological improvement. Through rigorous application and datasets analysis, the approach provides valuable insights into maximizing transformer architectures' potential through global context utilization. As research in this area progresses, such models may serve a foundational role in advancing entity-centric natural language understanding systems.

PDF Markdown

GitHub

GitHub - studio-ousia/luke: LUKE -- Language Understanding with Knowledge-based Embeddings (692 stars)

Tweets

https://twitter.com/ikuyamada/status/1516056403542421505