Building a Corpus for Biomedical Relation Extraction of Species Mentions
Abstract: We present a manually annotated corpus, Species-Species Interaction, for extracting meaningful binary relations between species, in biomedical texts, at sentence level, with a focus on the gut microbiota. The corpus leverages PubTator to annotate species in full-text articles after evaluating different Named Entity Recognition species taggers. Our first results are promising for extracting relations between species using BERT and its biomedical variants.
- Alan R Aronson. 2006. MetaMap: Mapping Text to the UMLS Metathesaurus. Bethesda, MD: NLM, NIH, DHHS, 1:1–26.
- Concept Annotation in the CRAFT Corpus. BMC bioinformatics, 13(1):1–20.
- SciBERT: A Pretrained Language Model for Scientific Text. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3615–3620, Hong Kong, China.
- Extraction of Relations Between Genes and Diseases from Text and Large-Scale Data Analysis: Implications for Translational Research. BMC Bioinformatics, 16(1).
- PMC Text Mining Subset in BioC: About Three Million Full-Text Articles and Growing. BMC Bioinformatics, 35(18):3533–3535.
- BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), pages 4171–4186, Minneapolis, MN, USA.
- LINNAEUS: a Species Name Identification System for Biomedical Literature. BMC Bioinformatics, 11(1):1–17.
- Inductive Representation Learning on Large Graphs. In Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS’17), page 1025–1035, Red Hook, NY, USA.
- BioBERT: a Pre-Trained Biomedical Language Representation Model for Biomedical Text Mining. BMC Bioinformatics, 36(4):1234–1240.
- Distant Supervision for Relation Extraction Without Labeled Data. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, pages 1003–1011.
- ScispaCy: Fast and Robust Models for Biomedical Natural Language Processing. In Proceedings of the 18th BioNLP Workshop and Shared Task, pages 319–327, Florence, Italy.
- The SPECIES and ORGANISMS Resources for Fast and Accurate Identification of Taxonomic Names in Text. PloS one, 8(6):e65390.
- NCBI Taxonomy: a Comprehensive Update on Curation, Resources and Tools. Database, 2020.
- SemEval-2013 Task 9 : Extraction of Drug-Drug Interactions from Biomedical Texts (DDIExtraction 2013). In Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013), pages 341–350, Atlanta, GE, USA.
- The EU-ADR Corpus: Annotated Drugs, Diseases, Targets, and their Relationships. Journal of biomedical informatics, 45(5):879–884.
- PubTator central: Automated Concept Annotation for Biomedical Full Text Articles. Nucleic acids research, 47(W1):W587–W593.
- Name Matters: Taxonomic Name Recognition (TNR) in Biodiversity Hheritage Library (BHL). iConference 2010 Proceedings. University of Illinois, 2010.
- LinkBERT: Pretraining Language Models with Document Links. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (ACL), pages 8003–8016, Dublin, Ireland.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.