Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Mind the GAP: A Balanced Corpus of Gendered Ambiguous Pronouns (1810.05201v1)

Published 11 Oct 2018 in cs.CL

Abstract: Coreference resolution is an important task for natural language understanding, and the resolution of ambiguous pronouns a longstanding challenge. Nonetheless, existing corpora do not capture ambiguous pronouns in sufficient volume or diversity to accurately indicate the practical utility of models. Furthermore, we find gender bias in existing corpora and systems favoring masculine entities. To address this, we present and release GAP, a gender-balanced labeled corpus of 8,908 ambiguous pronoun-name pairs sampled to provide diverse coverage of challenges posed by real-world text. We explore a range of baselines which demonstrate the complexity of the challenge, the best achieving just 66.9% F1. We show that syntactic structure and continuous neural models provide promising, complementary cues for approaching the challenge.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Kellie Webster (14 papers)
  2. Marta Recasens (1 paper)
  3. Vera Axelrod (9 papers)
  4. Jason Baldridge (45 papers)

Summary

We haven't generated a summary for this paper yet.