2000 character limit reached
Benchingmaking Large Langage Models in Biomedical Triple Extraction (2310.18463v6)
Published 27 Oct 2023 in cs.CL
Abstract: Biomedical triple extraction systems aim to automatically extract biomedical entities and relations between entities. The exploration of applying LLMs (LLM) to triple extraction is still relatively unexplored. In this work, we mainly focus on sentence-level biomedical triple extraction. Furthermore, the absence of a high-quality biomedical triple extraction dataset impedes the progress in developing robust triple extraction systems. To address these challenges, initially, we compare the performance of various LLMs. Additionally, we present GIT, an expert-annotated biomedical triple extraction dataset that covers a wider range of relation types.
- Dialogue relation extraction with document-level heterogeneous graph attention networks. Cognitive Computation, pages 1–10.
- Polysearch: a web-based text mining system for extracting relationships between human diseases, genes, mutations, drugs and metabolites. Nucleic acids research, 36(suppl_2):W399–W405.
- Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
- Lasuie: Unifying information extraction with latent adaptive structure-aware generative language model. Advances in Neural Information Processing Systems, 35:15460–15475.
- Christiane Fellbaum. 2010. Wordnet. In Theory and applications of ontology: computer applications, pages 231–243. Springer.
- Easy-to-hard learning for information extraction. arXiv preprint arXiv:2305.09193.
- Development of a benchmark corpus to support the automatic extraction of drug-related adverse effects from medical case reports. Journal of biomedical informatics, 45(5):885–892.
- Biorelex 1.0: Biological relation extraction benchmark. In Proceedings of the 18th BioNLP Workshop and Shared Task, pages 176–190.
- Constructing a semantic predication gold standard from the biomedical literature. BMC Bioinformatics, 12(1):1–17.
- Broad-coverage biomedical relation extraction with semrep. BMC bioinformatics, 21:1–28.
- SemMedDB: a PubMed-scale repository of biomedical semantic predications. Bioinformatics, 28(23):3158–3160.
- Comagc: a corpus with multi-faceted annotations of gene-cancer relations. BMC bioinformatics, 14:1–17.
- A hierarchical n-gram framework for zero-shot link prediction. arXiv preprint arXiv:2204.10293.
- Mingchen Li and Shihao Ji. 2022. Semantic structure based query graph prediction for question answering over knowledge graph. arXiv preprint arXiv:2204.10194.
- W-procer: Weighted prototypical contrastive learning for medical few-shot named entity recognition. arXiv preprint arXiv:2305.18624.
- Mingchen Li and Rui Zhang. 2023. How far is language model from 100% few-shot named entity recognition in medical domain. arXiv preprint arXiv:2307.00186.
- Multi-fusion chinese wordnet (mcw): Compound of machine learning and manual correction. arXiv preprint arXiv:2002.01761.
- Rexuie: A recursive method with explicit schema instructor for universal information extraction. arXiv preprint arXiv:2304.14770.
- Universal information extraction as unified semantic matching. arXiv preprint arXiv:2301.03282.
- Unified structure generation for universal information extraction. arXiv preprint arXiv:2203.12277.
- Semeval-2013 task 9: Extraction of drug-drug interactions from biomedical texts (ddiextraction 2013). Association for Computational Linguistics.
- Onerel: Joint entity and relation extraction with one module in one step. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pages 11285–11293.
- Joint entity and relation extraction with set prediction networks. IEEE Transactions on Neural Networks and Learning Systems.
- Chemprot: a disease chemical biology database. Nucleic acids research, 39(suppl_1):D367–D372.
- Query-based instance discrimination network for relational triple extraction. arXiv preprint arXiv:2211.01797.
- Unirel: Unified representation and interaction for joint relational triple extraction. arXiv preprint arXiv:2211.09039.
- Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971.
- The eu-adr corpus: annotated drugs, diseases, targets, and their relationships. Journal of biomedical informatics, 45(5):879–884.
- Pubtator: a web-based text mining tool for assisting biocuration. Nucleic acids research, 41(W1):W518–W522.
- Jacob White. 2020. Pubmed 2.0. Medical reference services quarterly, 39(4):382–387.
- Pmc-llama: Further finetuning llama on medical papers. arXiv preprint arXiv:2304.14454.
- Learning the extraction order of multiple relational facts in a sentence with reinforcement learning. In Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP), pages 367–377.
- Drug repurposing for covid-19 via knowledge graph completion. Journal of biomedical informatics, 115:103696.
- Complementary and integrative health information in the literature: its lexicon and named entity recognition. Journal of the American Medical Informatics Association, page ocad216.
- Mingchen Li (50 papers)
- Huixue Zhou (14 papers)
- Rui Zhang (1138 papers)