Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

MedDistant19: Towards an Accurate Benchmark for Broad-Coverage Biomedical Relation Extraction (2204.04779v2)

Published 10 Apr 2022 in cs.CL and cs.LG

Abstract: Relation extraction in the biomedical domain is challenging due to the lack of labeled data and high annotation costs, needing domain experts. Distant supervision is commonly used to tackle the scarcity of annotated data by automatically pairing knowledge graph relationships with raw texts. Such a pipeline is prone to noise and has added challenges to scale for covering a large number of biomedical concepts. We investigated existing broad-coverage distantly supervised biomedical relation extraction benchmarks and found a significant overlap between training and test relationships ranging from 26% to 86%. Furthermore, we noticed several inconsistencies in the data construction process of these benchmarks, and where there is no train-test leakage, the focus is on interactions between narrower entity types. This work presents a more accurate benchmark MedDistant19 for broad-coverage distantly supervised biomedical relation extraction that addresses these shortcomings and is obtained by aligning the MEDLINE abstracts with the widely used SNOMED Clinical Terms knowledge base. Lacking thorough evaluation with domain-specific LLMs, we also conduct experiments validating general domain relation extraction findings to biomedical relation extraction.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Saadullah Amin (5 papers)
  2. Pasquale Minervini (88 papers)
  3. David Chang (4 papers)
  4. Pontus Stenetorp (68 papers)
  5. Günter Neumann (9 papers)
Citations (3)