Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
158 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

AmbigDocs: Reasoning across Documents on Different Entities under the Same Name (2404.12447v3)

Published 18 Apr 2024 in cs.CL

Abstract: Different entities with the same name can be difficult to distinguish. Handling confusing entity mentions is a crucial skill for LLMs (LMs). For example, given the question "Where was Michael Jordan educated?" and a set of documents discussing different people named Michael Jordan, can LMs distinguish entity mentions to generate a cohesive answer to the question? To test this ability, we introduce a new benchmark, AmbigDocs. By leveraging Wikipedia's disambiguation pages, we identify a set of documents, belonging to different entities who share an ambiguous name. From these documents, we generate questions containing an ambiguous name and their corresponding sets of answers. Our analysis reveals that current state-of-the-art models often yield ambiguous answers or incorrectly merge information belonging to different entities. We establish an ontology categorizing four types of incomplete answers and automatic evaluation metrics to identify such categories. We lay the foundation for future work on reasoning across multiple documents with ambiguous entities.

Citations (3)

Summary

  • The paper introduces a novel dataset, AmbigDocs, challenging LMs to disambiguate entities across multiple documents with Wikipedia-based examples.
  • It employs a rigorous construction method with over 36K examples and a specialized ontology to categorize varied incomplete answer types.
  • Experimental results reveal that LMs struggle with precise disambiguation, highlighting the need for context-sensitive input tuning.

Exploring Entity Disambiguation in Multi-Document Settings: Introducing AmbigDocs

Introduction to AmbigDocs

The task of distinguishing between different entities with the same name in multi-document environments is non-trivial and highly significant for improving the performance of LLMs (LMs) in information retrieval and natural language understanding. The paper introduces a new dataset, AmbigDocs, which is designed to challenge and evaluate the capabilities of state-of-the-art LMs in handling entity disambiguation across multiple documents. The dataset leverages Wikipedia's disambiguation pages to create a synthetic setup where each instance consists of a question about an ambiguous entity and associated documents that each correspond to a distinct entity referred by the same name.

Dataset Construction

The AmbigDocs dataset is constructed in a structured approach that ensures diversity and complexity:

  • Surface Name Identification: A 'surface name' representing the ambiguous entity is identified.
  • Disambiguated Entities and Documents: For each surface name, multiple 'disambiguated entities' are determined, each with corresponding documents extracted from Wikipedia, ensuring a rich contextual variance.
  • Question and Answer Generation: For every entity, questions are synthetically generated which are contextually relevant to the documents but have varying answers depending on the entity targeted.

This rigor in dataset creation ensures a comprehensive suite of examples across different domains, with a total of 36K examples encompassing over 102K unique entities.

Evaluation Framework

A systematic evaluation framework is proposed to assess how well LMs can handle the ambiguity in entity disambiguation within the provided documents:

  • Types of Incomplete Answers: An ontology is defined categorizing the answers into five types — complete, partial, ambiguous, merged, and no answer — to diagnose different aspects of the models' capabilities and shortcomings.
  • Automated Metrics: Alongside traditional metrics like answer recall and entity recall, a novel 'Entity-Answer Recall' metric is introduced which focuses on the model's ability to link correct answers to their corresponding entities.
  • Manual and Automatic Categorization: Both manual annotations and an automated classification heuristic are employed for robust evaluation of the model outputs against the complex scenarios presented in AmbigDocs.

Experimental Insights

Initial experiments with various LLMs, including both open-source models like Llama2 and commercial systems like GPT-4, reveal significant challenges:

  • Entity Disambiguation: Most models struggled with clearly disambiguating entities when multiple documents are provided, often merging details across entities or failing to attach specific answers to specific entities.
  • Impact of Precision: In-context examples and precision in input documents notably enhance the performance, suggesting potential areas for further tuning and improvements of LMs in ambiguous environments.

Implications and Future Work

The paper highlights a critical area in natural language processing — dealing with ambiguity in real-world data. The insights provided could potentially influence future designs of LMs, particularly in their application to tasks like search and information retrieval where disambiguating entities is a common challenge. Future research could explore more sophisticated mechanisms for improving entity recognition and disambiguation, potentially integrating more granular contextual understanding and cross-document entity linkage.

In conclusion, AmbigDocs presents a new frontier for testing and enhancing the reasoning capabilities of LMs across ambiguous, multi-entity documents, setting a foundation for more nuanced and capable natural language understanding systems in complex informational landscapes.