Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Focus on what matters: Applying Discourse Coherence Theory to Cross Document Coreference (2110.05362v1)

Published 11 Oct 2021 in cs.CL and cs.LG

Abstract: Performing event and entity coreference resolution across documents vastly increases the number of candidate mentions, making it intractable to do the full $n2$ pairwise comparisons. Existing approaches simplify by considering coreference only within document clusters, but this fails to handle inter-cluster coreference, common in many applications. As a result cross-document coreference algorithms are rarely applied to downstream tasks. We draw on an insight from discourse coherence theory: potential coreferences are constrained by the reader's discourse focus. We model the entities/events in a reader's focus as a neighborhood within a learned latent embedding space which minimizes the distance between mentions and the centroids of their gold coreference clusters. We then use these neighborhoods to sample only hard negatives to train a fine-grained classifier on mention pairs and their local discourse features. Our approach achieves state-of-the-art results for both events and entities on the ECB+, Gun Violence, Football Coreference, and Cross-Domain Cross-Document Coreference corpora. Furthermore, training on multiple corpora improves average performance across all datasets by 17.2 F1 points, leading to a robust coreference resolution model for use in downstream tasks where link distribution is unknown.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. William Held (17 papers)
  2. Dan Iter (16 papers)
  3. Dan Jurafsky (118 papers)
Citations (18)

Summary

We haven't generated a summary for this paper yet.