Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Revisiting DocRED -- Addressing the False Negative Problem in Relation Extraction (2205.12696v3)

Published 25 May 2022 in cs.CL and cs.IR

Abstract: The DocRED dataset is one of the most popular and widely used benchmarks for document-level relation extraction (RE). It adopts a recommend-revise annotation scheme so as to have a large-scale annotated dataset. However, we find that the annotation of DocRED is incomplete, i.e., false negative samples are prevalent. We analyze the causes and effects of the overwhelming false negative problem in the DocRED dataset. To address the shortcoming, we re-annotate 4,053 documents in the DocRED dataset by adding the missed relation triples back to the original DocRED. We name our revised DocRED dataset Re-DocRED. We conduct extensive experiments with state-of-the-art neural models on both datasets, and the experimental results show that the models trained and evaluated on our Re-DocRED achieve performance improvements of around 13 F1 points. Moreover, we conduct a comprehensive analysis to identify the potential areas for further improvement. Our dataset is publicly available at https://github.com/tonytan48/Re-DocRED.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Qingyu Tan (9 papers)
  2. Lu Xu (68 papers)
  3. Lidong Bing (144 papers)
  4. Hwee Tou Ng (44 papers)
  5. Sharifah Mahani Aljunied (7 papers)
Citations (51)

Summary

We haven't generated a summary for this paper yet.

Github Logo Streamline Icon: https://streamlinehq.com

GitHub