Evaluation of LLMs on Long-tail Entity Linking in Historical Documents (2505.03473v1)

Published 6 May 2025 in cs.CL

Abstract: Entity Linking (EL) plays a crucial role in NLP applications, enabling the disambiguation of entity mentions by linking them to their corresponding entries in a reference knowledge base (KB). Thanks to their deep contextual understanding capabilities, LLMs offer a new perspective to tackle EL, promising better results than traditional methods. Despite the impressive generalization capabilities of LLMs, linking less popular, long-tail entities remains challenging as these entities are often underrepresented in training data and knowledge bases. Furthermore, the long-tail EL task is an understudied problem, and limited studies address it with LLMs. In the present work, we assess the performance of two popular LLMs, GPT and LLama3, in a long-tail entity linking scenario. Using MHERCL v0.1, a manually annotated benchmark of sentences from domain-specific historical texts, we quantitatively compare the performance of LLMs in identifying and linking entities to their corresponding Wikidata entries against that of ReLiK, a state-of-the-art Entity Linking and Relation Extraction framework. Our preliminary experiments reveal that LLMs perform encouragingly well in long-tail EL, indicating that this technology can be a valuable adjunct in filling the gap between head and long-tail EL.

PDF Abstract

Evaluation of LLMs on Long-tail Entity Linking in Historical Documents

Entity Linking (EL) remains an essential yet challenging task within NLP, requiring systems to disambiguate entity mentions and associate them with appropriate entries in a structured knowledge base (KB) such as Wikidata. The paper "Evaluation of LLMs on Long-tail Entity Linking in Historical Documents" investigates the challenges long-tail entities present in EL, particularly when leveraging LLMs like GPT and LLama3, and contrasts these with traditional EL approaches.

Context and Objective

Entity Linking is foundational to NLP as it enriches natural language text with structured knowledge through KBs, improving comprehension and enabling sophisticated content analysis. However, linking less common 'long-tail' entities is particularly complex due to their underrepresentation in training datasets and KBs. This paper aims to assess the efficacy of LLMs in enhancing EL, focusing on long-tail entity linking in historical documents, an area that is notably underexplored.

Methodology and Comparative Analysis

Two prominent LLMs, GPT and LLama3, are evaluated in this paper using the MHERCL v0.1 dataset, consisting of manually annotated sentences drawn from domain-specific historical texts. The performance of these models is compared with ReLiK, a state-of-the-art Entity Linking and Relation Extraction framework. A distinctive approach in this paper is leveraging the deep contextual understanding capabilities of LLMs, hypothesized to improve on traditional EL methods by potentially bridging the performance gap between frequent and infrequent entities.

Models Used: GPT-3.5-turbo-instruct and LLama-3 (8B and 70B parameter versions) are employed due to their expansive pre-training and linguistic prowess.
Dataset: MHERCL v0.1 contains sentences enriched with niche, historical knowledge—ideal for assessing EL in long-tail scenarios.
Baseline: The performance of the LLMs is compared to ReLiK, known for its high inference speed and accuracy in entity linking.

Results and Findings

The outcomes reveal that while ReLiK demonstrates high precision, its recall for long-tail entities is somewhat limited (45.7%). In contrast, LLama3-70B achieves a recall of 60.3%, significantly surpassing ReLiK in retrieving long-tail entities. This indicates LLMs may serve as valuable tools in detecting a broader range of entities. However, the LLMs face challenges in precision, likely due to their tendency to over-generate fictitious entities when confined to limited contextual information. Despite this shortcoming, the results suggest that with modifications such as improved prompting strategies or context augmentation, LLMs could offer competitive performance in long-tail EL tasks.

Implications and Future Directions

The findings underscore the potential for LLMs to transform EL practices, providing substantial gains in recall over traditional systems in niche domains. This suggests that LLMs hold promise in enhancing retrieval and disambiguation performance in complex scenarios where long-tail entities dominate. Future work should investigate advanced techniques such as In-Context Learning (ICL) or Knowledge Injection to better harness LLMs in EL, optimizing the balance between recall and precision. Additionally, employing LLMs as hybrid components alongside specialized EL systems like ReLiK could further broaden the scope and efficacy in managing domain-specific and long-tail entities effectively.

In summary, the paper provides valuable insights into the evolving landscape of EL, highlighting the broader implications of deploying LLMs in challenging long-tail scenarios and encouraging further exploration into refining LLM applications within NLP.

PDF Markdown Bookmark Chat (Pro)

Authors (6)

Marta Boscariol (1 paper)
Luana Bulla (2 papers)
Lia Draetta (1 paper)
Beatrice Fiumanò (1 paper)
Emanuele Lenzi (1 paper)
Leonardo Piano (3 papers)

Related Papers

Find Related Papers

YouTube

Show All Videos