Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Comprehensive Named Entity Recognition on CORD-19 with Distant or Weak Supervision (2003.12218v5)

Published 27 Mar 2020 in cs.CL, cs.AI, and cs.IR

Abstract: We created this CORD-NER dataset with comprehensive named entity recognition (NER) on the COVID-19 Open Research Dataset Challenge (CORD-19) corpus (2020-03-13). This CORD-NER dataset covers 75 fine-grained entity types: In addition to the common biomedical entity types (e.g., genes, chemicals and diseases), it covers many new entity types related explicitly to the COVID-19 studies (e.g., coronaviruses, viral proteins, evolution, materials, substrates and immune responses), which may benefit research on COVID-19 related virus, spreading mechanisms, and potential vaccines. CORD-NER annotation is a combination of four sources with different NER methods. The quality of CORD-NER annotation surpasses SciSpacy (over 10% higher on the F1 score based on a sample set of documents), a fully supervised BioNER tool. Moreover, CORD-NER supports incrementally adding new documents as well as adding new entity types when needed by adding dozens of seeds as the input examples. We will constantly update CORD-NER based on the incremental updates of the CORD-19 corpus and the improvement of our system.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Xuan Wang (205 papers)
  2. Xiangchen Song (22 papers)
  3. Bangzheng Li (23 papers)
  4. Yingjun Guan (4 papers)
  5. Jiawei Han (263 papers)
Citations (56)

Summary

We haven't generated a summary for this paper yet.