Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
194 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Stress Testing BERT Anaphora Resolution Models for Reaction Extraction in Chemical Patents (2306.13379v1)

Published 23 Jun 2023 in cs.CL

Abstract: The high volume of published chemical patents and the importance of a timely acquisition of their information gives rise to automating information extraction from chemical patents. Anaphora resolution is an important component of comprehensive information extraction, and is critical for extracting reactions. In chemical patents, there are five anaphoric relations of interest: co-reference, transformed, reaction associated, work up, and contained. Our goal is to investigate how the performance of anaphora resolution models for reaction texts in chemical patents differs in a noise-free and noisy environment and to what extent we can improve the robustness against noise of the model.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (42)
  1. Annotated Chemical Patent Corpus: A Gold Standard for Text Mining. PLoS ONE 9, 9 (9 2014). https://doi.org/10.1371/JOURNAL.PONE.0107477
  2. Automatic identification of relevant chemical compounds from patents. Database: The Journal of Biological Databases and Curation 2019 (1 2019). https://doi.org/10.1093/DATABASE/BAZ001
  3. Naveed Akhtar and Ajmal Mian. 2018. Threat of Adversarial Attacks on Deep Learning in Computer Vision: A Survey. , 14410–14430 pages. https://doi.org/10.1109/ACCESS.2018.2807385
  4. Publicly Available Clinical BERT Embeddings. (2019), 72–78. https://www.ncbi.nlm.nih.gov/pmc/
  5. Stress Test Evaluation of Biomedical Word Embeddings. Technical Report. https://github.com/allenai/allennlp-models
  6. Stress Test Evaluation of Transformer-based Models in Natural Language Understanding Tasks. (2020), 11–16. https://github.com/
  7. Enkhbold Bataa and Joshua Wu. 2019. An Investigation of Transfer Learning-Based Sentiment Analysis in Japanese. (2019), 4652–4657.
  8. Assessing the impact of OCR errors in information retrieval. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 12036 LNCS (2020), 102–109. https://doi.org/10.1007/978-3-030-45442-5{_}13
  9. Kevin Clark and Christopher D Manning. 2016. Improving Coreference Resolution by Learning Entity-Level Distributed Representations. (2016).
  10. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. (2019). https://github.com/tensorflow/tensor2tensor
  11. A pipelined approach to Anaphora Resolution in Chemical Patents. Technical Report. http://chemu.eng.unimelb.edu.au/chemu/overview
  12. Markus Eberts and Adrian Ulges. 2019. Span-based Joint Entity and Relation Extraction with Transformer Pre-training. (9 2019). https://doi.org/10.3233/FAIA200321
  13. ChEMU-Ref: A Corpus for Modeling Anaphora Resolution in the Chemical Domain. (2021). http://chemu.eng.unimelb.edu.au/
  14. ChEMU-Ref dataset for Modeling Anaphora Resolution in the Chemical Domain. 1 (2021). https://doi.org/10.17632/R28XXR6P92.1
  15. Metrics for Multi-Class Classification: an Overview. (8 2020). http://arxiv.org/abs/2008.05756
  16. Kai Hakala and Sampo Pyysalo. 2019. Biomedical Named Entity Recognition with Multilingual BERT. (2019), 56–61. http://brat.nlplab.org
  17. In-Depth Analysis of the Impact of OCR Errors on Named Entity Recognition and Linking A RT I C L E In-Depth Analysis of the Impact of OCR Errors on Named Entity Recognition and Linking. Natural Language Engineering (2022), 1–1. https://doi.org/10.1017/S1351324922000110{{̈i}}
  18. ChEMU 2020: Natural Language Processing Methods Are Effective for Information Extraction From Chemical Patents. Frontiers in Research Metrics and Analytics 6 (3 2021). https://doi.org/10.3389/FRMA.2021.654438
  19. SemEval-2010 Task 8: Multi-Way Classification of Semantic Relations Between Pairs of Nominals. (2010), 15–16. http://docs.
  20. Unrestricted Bridging Resolution. (2018). https://doi.org/10.1162/COLI
  21. Hideo Kobayashi and Vincent Ng. 2020. Bridging Resolution: A Survey of the State of the Art. (2020), 3708–3721.
  22. CHEMDNER: The drugs and chemical names extraction challenge. Journal of Cheminformatics 7, 1 (1 2015), 1–11. https://doi.org/10.1186/1758-2946-7-S1-S1/FIGURES/2
  23. BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 36, 4 (2 2020), 1234–1240. https://doi.org/10.1093/BIOINFORMATICS/BTZ682
  24. Extended Overview of ChEMU 2021: Reaction Reference Resolution and Anaphora Resolution in Chemical Patents. (2021). https://www.reaxys.com
  25. Daniel Lopresti. 2009. Optical Character Recognition Errors and Their Effects on Natural Language Processing. (2009).
  26. Ruslan Mitkov. 2003. The Oxford Handbook of Computational Linguistics. Oxford University Press, Inc, New York, NY, USA.
  27. Extensive Error Analysis and a Learning-Based Evaluation of Medical Entity Recognition Systems to Approximate User Experience. Online (2020), 177–186. https://github.com/nrc-cnrc/
  28. Transfer Learning in Biomedical Natural Language Processing: An Evaluation of BERT and ELMo on Ten Benchmarking Datasets. (2019). https://biocreative.
  29. Computational modelling of coreference and bridging resolution. Technical Report.
  30. Improved Chemical Text Mining of Patents with Infinite Dictionaries and Automatic Spelling Correction. (2011). https://doi.org/10.1021/ci200463r
  31. Managing expectations: assessment of chemistry databases generated by automated extraction of chemical structures from patents. J Cheminform 7 (2015), 49. https://doi.org/10.1186/s13321-015-0097-z
  32. Yatian Shen and Xuanjing Huang. 2016. Attention-Based Convolutional Neural Network for Semantic Relation Extraction. (2016), 2526–2536.
  33. Matching the Blanks: Distributional Similarity for Relation Learning. (2019).
  34. Semantic Compositionality through Recursive Matrix-Vector Spaces. (2012), 12–14. www.socher.org.
  35. BRAT: a Web-based Tool for NLP-Assisted Text Annotation. (2012), 102–107. http://brat.nlplab.org
  36. Shanchan Wu and Yifan He. 2019. Enriching Pre-trained Language Model with Entity Information for Relation Classification. (2019).
  37. Relation Classification via Convolutional Deep Neural Network. (2014), 2335–2344. http://en.wikipedia.org/wiki/Bag-of-words
  38. Improving Chemical Named Entity Recognition in Patents with Contextualized Word Embeddings. (2019), 328–338.
  39. Dongxu Zhang and Dong Wang. 2015. Relation Classification via Recurrent Neural Network. (8 2015). http://arxiv.org/abs/1508.01006
  40. Position-aware Attention and Supervised Data Improve Slot Filling. (2017), 35–45.
  41. Retrospective Reader for Machine Reading Comprehension. (1 2020). http://arxiv.org/abs/2001.09694
  42. Searching for Effective Neural Extractive Summarization: What Works and What’s Next. (2019), 1049–1058. https://github.com/fastnlp/fastNLP

Summary

We haven't generated a summary for this paper yet.