Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Entity Linking and Discovery via Arborescence-based Supervised Clustering (2109.01242v2)

Published 2 Sep 2021 in cs.CL and cs.LG

Abstract: Previous work has shown promising results in performing entity linking by measuring not only the affinities between mentions and entities but also those amongst mentions. In this paper, we present novel training and inference procedures that fully utilize mention-to-mention affinities by building minimum arborescences (i.e., directed spanning trees) over mentions and entities across documents in order to make linking decisions. We also show that this method gracefully extends to entity discovery, enabling the clustering of mentions that do not have an associated entity in the knowledge base. We evaluate our approach on the Zero-Shot Entity Linking dataset and MedMentions, the largest publicly available biomedical dataset, and show significant improvements in performance for both entity linking and discovery compared to identically parameterized models. We further show significant efficiency improvements with only a small loss in accuracy over previous work, which use more computationally expensive models.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Dhruv Agarwal (17 papers)
  2. Rico Angell (12 papers)
  3. Nicholas Monath (29 papers)
  4. Andrew McCallum (132 papers)
Citations (28)