Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

BioMNER: A Dataset for Biomedical Method Entity Recognition (2406.20038v1)

Published 28 Jun 2024 in cs.CL

Abstract: Named entity recognition (NER) stands as a fundamental and pivotal task within the realm of Natural Language Processing. Particularly within the domain of Biomedical Method NER, this task presents notable challenges, stemming from the continual influx of domain-specific terminologies in scholarly literature. Current research in Biomedical Method (BioMethod) NER suffers from a scarcity of resources, primarily attributed to the intricate nature of methodological concepts, which necessitate a profound understanding for precise delineation. In this study, we propose a novel dataset for biomedical method entity recognition, employing an automated BioMethod entity recognition and information retrieval system to assist human annotation. Furthermore, we comprehensively explore a range of conventional and contemporary open-domain NER methodologies, including the utilization of cutting-edge large-scale LLMs customised to our dataset. Our empirical findings reveal that the large parameter counts of LLMs surprisingly inhibit the effective assimilation of entity extraction patterns pertaining to biomedical methods. Remarkably, the approach, leveraging the modestly sized ALBERT model (only 11MB), in conjunction with conditional random fields (CRF), achieves state-of-the-art (SOTA) performance.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Chen Tang (94 papers)
  2. Bohao Yang (16 papers)
  3. Kun Zhao (97 papers)
  4. Bo Lv (10 papers)
  5. Chenghao Xiao (21 papers)
  6. Frank Guerin (30 papers)
  7. Chenghua Lin (127 papers)
X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets