Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

EDIN: An End-to-end Benchmark and Pipeline for Unknown Entity Discovery and Indexing (2205.12570v1)

Published 25 May 2022 in cs.CL

Abstract: Existing work on Entity Linking mostly assumes that the reference knowledge base is complete, and therefore all mentions can be linked. In practice this is hardly ever the case, as knowledge bases are incomplete and because novel concepts arise constantly. This paper created the Unknown Entity Discovery and Indexing (EDIN) benchmark where unknown entities, that is entities without a description in the knowledge base and labeled mentions, have to be integrated into an existing entity linking system. By contrasting EDIN with zero-shot entity linking, we provide insight on the additional challenges it poses. Building on dense-retrieval based entity linking, we introduce the end-to-end EDIN pipeline that detects, clusters, and indexes mentions of unknown entities in context. Experiments show that indexing a single embedding per entity unifying the information of multiple mentions works better than indexing mentions independently.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Nora Kassner (22 papers)
  2. Fabio Petroni (37 papers)
  3. Mikhail Plekhanov (4 papers)
  4. Sebastian Riedel (140 papers)
  5. Nicola Cancedda (16 papers)
Citations (6)

Summary

We haven't generated a summary for this paper yet.