How far is Language Model from 100% Few-shot Named Entity Recognition in Medical Domain (2307.00186v2)
Abstract: Recent advancements in LLMs (LMs) have led to the emergence of powerful models such as Small LMs (e.g., T5) and Large LMs (e.g., GPT-4). These models have demonstrated exceptional capabilities across a wide range of tasks, such as name entity recognition (NER) in the general domain. (We define SLMs as pre-trained models with fewer parameters compared to models like GPT-3/3.5/4, such as T5, BERT, and others.) Nevertheless, their efficacy in the medical section remains uncertain and the performance of medical NER always needs high accuracy because of the particularity of the field. This paper aims to provide a thorough investigation to compare the performance of LMs in medical few-shot NER and answer How far is LMs from 100\% Few-shot NER in Medical Domain, and moreover to explore an effective entity recognizer to help improve the NER performance. Based on our extensive experiments conducted on 16 NER models spanning from 2018 to 2023, our findings clearly indicate that LLMs outperform SLMs in few-shot medical NER tasks, given the presence of suitable examples and appropriate logical frameworks. Despite the overall superiority of LLMs in few-shot medical NER tasks, it is important to note that they still encounter some challenges, such as misidentification, wrong template prediction, etc. Building on previous findings, we introduce a simple and effective method called \textsc{RT} (Retrieving and Thinking), which serves as retrievers, finding relevant examples, and as thinkers, employing a step-by-step reasoning process. Experimental results show that our proposed \textsc{RT} framework significantly outperforms the strong open baselines on the two open medical benchmark datasets
- Dhananjay Ashok and Zachary C Lipton. 2023. Promptner: Prompting for named entity recognition. arXiv preprint arXiv:2305.15444.
- Learning in-context learning for named entity recognition. arXiv preprint arXiv:2305.11038.
- Container: Few-shot named entity recognition via contrastive learning. arXiv preprint arXiv:2109.07589.
- Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
- Ncbi disease corpus: a resource for disease name recognition and concept normalization. Journal of biomedical informatics, 47:1–10.
- Few-shot classification in named entity recognition task. In Proceedings of the 34th ACM/SIGAPP Symposium on Applied Computing, pages 993–1000.
- Few-shot slot tagging with collapsed dependency transfer and label-enhanced task-adaptive projection network. arXiv preprint arXiv:2006.05702.
- Clinicalbert: Modeling clinical notes and predicting hospital readmission. arXiv preprint arXiv:1904.05342.
- Copner: Contrastive learning with prompt guiding for few-shot named entity recognition. In Proceedings of the 29th International conference on computational linguistics, pages 2515–2527.
- Few-shot named entity recognition with entity-level prototypical network enhanced by dispersedly distributed prototypes. arXiv preprint arXiv:2208.08023.
- Biobert: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics, 36(4):1234–1240.
- Biocreative v cdr task corpus: a resource for chemical disease relation extraction. Database, 2016.
- A hierarchical n-gram framework for zero-shot link prediction. arXiv preprint arXiv:2204.10293.
- Mingchen Li and Lifu Huang. 2023. Understand the dynamic world: An end-to-end knowledge informed framework for open domain entity state tracking. arXiv preprint arXiv:2304.13854.
- Mingchen Li and Jonathan Shihao Ji. 2022. Semantic structure based query graph prediction for question answering over knowledge graph. arXiv preprint arXiv:2204.10194.
- W-procer: Weighted prototypical contrastive learning for medical few-shot named entity recognition. arXiv preprint arXiv:2305.18624.
- Multi-fusion chinese wordnet (mcw): Compound of machine learning and manual correction. arXiv preprint arXiv:2002.01761.
- Qaner: Prompting question answering models for few-shot named entity recognition. arXiv preprint arXiv:2203.01543.
- Large language model is not a good few-shot information extractor, but a good reranker for hard samples! arXiv preprint arXiv:2303.08559.
- Rethinking the role of demonstrations: What makes in-context learning work? arXiv preprint arXiv:2202.12837.
- Consumer health question answering using off-the-shelf components. In European Conference on Information Retrieval, pages 571–579. Springer.
- Prototypical networks for few-shot learning. Advances in neural information processing systems, 30.
- Amber Stubbs and Özlem Uzuner. 2015. Annotating longitudinal clinical narratives for de-identification: The 2014 i2b2/uthealth corpus. Journal of biomedical informatics, 58:S20–S29.
- Gpt-ner: Named entity recognition via large language models. arXiv preprint arXiv:2304.10428.
- Chain of thought prompting elicits reasoning in large language models. arXiv preprint arXiv:2201.11903.
- Sam Wiseman and Karl Stratos. 2019. Label-agnostic sequence labeling by copying nearest neighbors. arXiv preprint arXiv:1906.04225.
- Medical knowledge graph: Data sources, construction, reasoning, and applications. Big Data Mining and Analytics, 6(2):201–217.
- A large language model for electronic health records. npj Digital Medicine, 5(1):194.
- Simple and effective few-shot named entity recognition with structured nearest neighbor learning. arXiv preprint arXiv:2010.02405.
- Tree of thoughts: Deliberate problem solving with large language models. arXiv preprint arXiv:2305.10601.
- Drug repurposing for covid-19 via knowledge graph completion. Journal of biomedical informatics, 115:103696.
- Optimizing bi-encoder for named entity recognition via contrastive learning. arXiv preprint arXiv:2208.14565.
- Sprda: a link prediction approach based on the structural perturbation to infer disease-associated piwi-interacting rnas. Briefings in Bioinformatics, 24(1):bbac498.
- Mingchen Li (50 papers)
- Rui Zhang (1138 papers)