2000 character limit reached
Stress Testing BERT Anaphora Resolution Models for Reaction Extraction in Chemical Patents (2306.13379v1)
Published 23 Jun 2023 in cs.CL
Abstract: The high volume of published chemical patents and the importance of a timely acquisition of their information gives rise to automating information extraction from chemical patents. Anaphora resolution is an important component of comprehensive information extraction, and is critical for extracting reactions. In chemical patents, there are five anaphoric relations of interest: co-reference, transformed, reaction associated, work up, and contained. Our goal is to investigate how the performance of anaphora resolution models for reaction texts in chemical patents differs in a noise-free and noisy environment and to what extent we can improve the robustness against noise of the model.
- Annotated Chemical Patent Corpus: A Gold Standard for Text Mining. PLoS ONE 9, 9 (9 2014). https://doi.org/10.1371/JOURNAL.PONE.0107477
- Automatic identification of relevant chemical compounds from patents. Database: The Journal of Biological Databases and Curation 2019 (1 2019). https://doi.org/10.1093/DATABASE/BAZ001
- Naveed Akhtar and Ajmal Mian. 2018. Threat of Adversarial Attacks on Deep Learning in Computer Vision: A Survey. , 14410–14430 pages. https://doi.org/10.1109/ACCESS.2018.2807385
- Publicly Available Clinical BERT Embeddings. (2019), 72–78. https://www.ncbi.nlm.nih.gov/pmc/
- Stress Test Evaluation of Biomedical Word Embeddings. Technical Report. https://github.com/allenai/allennlp-models
- Stress Test Evaluation of Transformer-based Models in Natural Language Understanding Tasks. (2020), 11–16. https://github.com/
- Enkhbold Bataa and Joshua Wu. 2019. An Investigation of Transfer Learning-Based Sentiment Analysis in Japanese. (2019), 4652–4657.
- Assessing the impact of OCR errors in information retrieval. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 12036 LNCS (2020), 102–109. https://doi.org/10.1007/978-3-030-45442-5{_}13
- Kevin Clark and Christopher D Manning. 2016. Improving Coreference Resolution by Learning Entity-Level Distributed Representations. (2016).
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. (2019). https://github.com/tensorflow/tensor2tensor
- A pipelined approach to Anaphora Resolution in Chemical Patents. Technical Report. http://chemu.eng.unimelb.edu.au/chemu/overview
- Markus Eberts and Adrian Ulges. 2019. Span-based Joint Entity and Relation Extraction with Transformer Pre-training. (9 2019). https://doi.org/10.3233/FAIA200321
- ChEMU-Ref: A Corpus for Modeling Anaphora Resolution in the Chemical Domain. (2021). http://chemu.eng.unimelb.edu.au/
- ChEMU-Ref dataset for Modeling Anaphora Resolution in the Chemical Domain. 1 (2021). https://doi.org/10.17632/R28XXR6P92.1
- Metrics for Multi-Class Classification: an Overview. (8 2020). http://arxiv.org/abs/2008.05756
- Kai Hakala and Sampo Pyysalo. 2019. Biomedical Named Entity Recognition with Multilingual BERT. (2019), 56–61. http://brat.nlplab.org
- In-Depth Analysis of the Impact of OCR Errors on Named Entity Recognition and Linking A RT I C L E In-Depth Analysis of the Impact of OCR Errors on Named Entity Recognition and Linking. Natural Language Engineering (2022), 1–1. https://doi.org/10.1017/S1351324922000110{{̈i}}
- ChEMU 2020: Natural Language Processing Methods Are Effective for Information Extraction From Chemical Patents. Frontiers in Research Metrics and Analytics 6 (3 2021). https://doi.org/10.3389/FRMA.2021.654438
- SemEval-2010 Task 8: Multi-Way Classification of Semantic Relations Between Pairs of Nominals. (2010), 15–16. http://docs.
- Unrestricted Bridging Resolution. (2018). https://doi.org/10.1162/COLI
- Hideo Kobayashi and Vincent Ng. 2020. Bridging Resolution: A Survey of the State of the Art. (2020), 3708–3721.
- CHEMDNER: The drugs and chemical names extraction challenge. Journal of Cheminformatics 7, 1 (1 2015), 1–11. https://doi.org/10.1186/1758-2946-7-S1-S1/FIGURES/2
- BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 36, 4 (2 2020), 1234–1240. https://doi.org/10.1093/BIOINFORMATICS/BTZ682
- Extended Overview of ChEMU 2021: Reaction Reference Resolution and Anaphora Resolution in Chemical Patents. (2021). https://www.reaxys.com
- Daniel Lopresti. 2009. Optical Character Recognition Errors and Their Effects on Natural Language Processing. (2009).
- Ruslan Mitkov. 2003. The Oxford Handbook of Computational Linguistics. Oxford University Press, Inc, New York, NY, USA.
- Extensive Error Analysis and a Learning-Based Evaluation of Medical Entity Recognition Systems to Approximate User Experience. Online (2020), 177–186. https://github.com/nrc-cnrc/
- Transfer Learning in Biomedical Natural Language Processing: An Evaluation of BERT and ELMo on Ten Benchmarking Datasets. (2019). https://biocreative.
- Computational modelling of coreference and bridging resolution. Technical Report.
- Improved Chemical Text Mining of Patents with Infinite Dictionaries and Automatic Spelling Correction. (2011). https://doi.org/10.1021/ci200463r
- Managing expectations: assessment of chemistry databases generated by automated extraction of chemical structures from patents. J Cheminform 7 (2015), 49. https://doi.org/10.1186/s13321-015-0097-z
- Yatian Shen and Xuanjing Huang. 2016. Attention-Based Convolutional Neural Network for Semantic Relation Extraction. (2016), 2526–2536.
- Matching the Blanks: Distributional Similarity for Relation Learning. (2019).
- Semantic Compositionality through Recursive Matrix-Vector Spaces. (2012), 12–14. www.socher.org.
- BRAT: a Web-based Tool for NLP-Assisted Text Annotation. (2012), 102–107. http://brat.nlplab.org
- Shanchan Wu and Yifan He. 2019. Enriching Pre-trained Language Model with Entity Information for Relation Classification. (2019).
- Relation Classification via Convolutional Deep Neural Network. (2014), 2335–2344. http://en.wikipedia.org/wiki/Bag-of-words
- Improving Chemical Named Entity Recognition in Patents with Contextualized Word Embeddings. (2019), 328–338.
- Dongxu Zhang and Dong Wang. 2015. Relation Classification via Recurrent Neural Network. (8 2015). http://arxiv.org/abs/1508.01006
- Position-aware Attention and Supervised Data Improve Slot Filling. (2017), 35–45.
- Retrospective Reader for Machine Reading Comprehension. (1 2020). http://arxiv.org/abs/2001.09694
- Searching for Effective Neural Extractive Summarization: What Works and What’s Next. (2019), 1049–1058. https://github.com/fastnlp/fastNLP