Evaluating the Robustness of Adverse Drug Event Classification Models Using Templates (2407.02432v1)
Abstract: An adverse drug effect (ADE) is any harmful event resulting from medical drug treatment. Despite their importance, ADEs are often under-reported in official channels. Some research has therefore turned to detecting discussions of ADEs in social media. Impressive results have been achieved in various attempts to detect ADEs. In a high-stakes domain such as medicine, however, an in-depth evaluation of a model's abilities is crucial. We address the issue of thorough performance evaluation in English-language ADE detection with hand-crafted templates for four capabilities: Temporal order, negation, sentiment, and beneficial effect. We find that models with similar performance on held-out test sets have varying results on these capabilities.
- Mimic-sbdh: A dataset for social and behavioral determinants of health. In Machine Learning in Health Care.
- Hassan Alhuzali and Sophia Ananiadou. 2019. Improving classification of adverse drug reactions through using sentiment analysis and transfer learning. In Proceedings of the 18th BioNLP Workshop and Shared Task, pages 339–347, Florence, Italy. Association for Computational Linguistics.
- COMETA: A corpus for medical entity linking in the social media. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 3122–3137, Online. Association for Computational Linguistics.
- Yonatan Belinkov and Yonatan Bisk. 2019. Synthetic and natural noise both break neural machine translation. Conference paper at ICLR 2018.
- A case study of efficacy and challenges in practical human-in-loop evaluation of NLP systems using checklist. In Proceedings of the Workshop on Human Evaluation of NLP Systems (HumEval), pages 120–130, Online. Association for Computational Linguistics.
- Unsupervised cross-lingual representation learning at scale. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 8440–8451, Online. Association for Computational Linguistics.
- Linguistic capabilities for a checklist-based evaluation in automatic text simplification. In Proceedings of the First Workshop on Current Trends in Text Simplification (CTTS 2021) co-located with the 37th Conference of the Spanish Society for Natural Language Processing (SEPLN2021) Online (initially located in Málaga, Spain), pages 70–83.
- Mining Twitter for Adverse Drug Reaction Mentions: A Corpus and Classification Benchmark. In Proceedings of the Fourth Workshop on Building and Evaluating Resources for Health and Biomedical Text Processing, pages 1–8.
- Domain-specific language model pretraining for biomedical natural language processing. ACM Transactions on Computing for Healthcare (HEALTH), 3(1):1–23.
- Development of a benchmark corpus to support the automatic extraction of drug-related adverse effects from medical case reports. Journal of Biomedical Informatics, 45(5):885–892.
- BERT implementation for detecting adverse drug effects mentions in Russian. In Proceedings of the Fifth Social Media Mining for Health Applications Workshop & Shared Task, pages 46–50, Barcelona, Spain (Online). Association for Computational Linguistics.
- An analysis of negation in natural language understanding corpora. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 716–723, Dublin, Ireland. Association for Computational Linguistics.
- Adverse Drug Reaction Classification With Deep Neural Networks. Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics, pages 877–887.
- SpanBERT: Improving pre-training by representing and predicting spans. Transactions of the Association for Computational Linguistics, 8:64–77.
- Dynabench: Rethinking benchmarking in NLP. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 4110–4124, Online. Association for Computational Linguistics.
- Towards Internet-age pharmacovigilance: Extracting adverse drug reactions from user posts in health-related social networks. In Proceedings of the 2010 Workshop on Biomedical Natural Language Processing, pages 117–125, Uppsala, Sweden. Association for Computational Linguistics.
- BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics, 36(4):1234–1240.
- Ilya Loshchilov and Frank Hutter. 2017. Decoupled weight decay regularization. arXiv, abs/1711.05101.
- Proceedings of the Sixth Social Media Mining for Health (#SMM4H) Workshop and Shared Task. Association for Computational Linguistics, Mexico City, Mexico.
- DeepADEMiner: A deep learning pharmacovigilance pipeline for extraction and normalization of adverse drug event mentions on Twitter. Journal of the American Medical Informatics Association, 28(10):2184–2192.
- Marta Marchiori Manerba and Sara Tonelli. 2021. Fine-grained fairness analysis of abusive language detection systems with CheckList. In Proceedings of the 5th Workshop on Online Abuse and Harms (WOAH 2021), pages 81–91, Online. Association for Computational Linguistics.
- Right for the wrong reasons: Diagnosing syntactic heuristics in natural language inference. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 3428–3448, Florence, Italy. Association for Computational Linguistics.
- KFU NLP team at SMM4H 2020 tasks: Cross-lingual transfer learning with pretrained language models for drug reactions. In Proceedings of the Fifth Social Media Mining for Health Applications Workshop & Shared Task, pages 51–56, Barcelona, Spain (Online). Association for Computational Linguistics.
- Milad Moradi and Matthias Samwald. 2022. Improving the robustness and accuracy of biomedical language models through adversarial training. Journal of Biomedical Informatics, 132:104114.
- Azadeh Nikfarjam and Graciela H. Gonzalez. 2011. Pattern mining for extraction of mentions of adverse drug reactions from user comments. In AMIA annual symposium proceedings, volume 2011, page 1019. American Medical Informatics Association.
- Pharmacovigilance from social media: mining adverse drug reaction mentions using sequence labeling with word embedding cluster features. Journal of the American Medical Informatics Association, 22(3):671–681.
- Supporting human-AI collaboration in auditing LLMs with LLMs. In Proceedings of the 2023 AAAI/ACM Conference on AI, Ethics, and Society, AIES ’23, pages 913–926. Association for Computing Machinery.
- Marco Tulio Ribeiro and Scott Lundberg. 2022. Adaptive testing and debugging of NLP models. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 3253–3267. Association for Computational Linguistics.
- Semantically equivalent adversarial rules for debugging NLP models. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 856–865, Melbourne, Australia. Association for Computational Linguistics.
- Beyond accuracy: Behavioral testing of NLP models with CheckList. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 4902–4912, Online. Association for Computational Linguistics.
- Data and systems for medication-related text classification and concept normalization from twitter: insights from the social media mining for health (smm4h)-2017 shared task. Journal of the American Medical Informatics Association, 25(10):1274–1283.
- NADE: A benchmark for robust adverse drug events extraction in face of negations. In Proceedings of the Seventh Workshop on Noisy User-generated Text (W-NUT 2021), pages 230–237, Online. Association for Computational Linguistics.
- Detecting drugs and adverse events from Spanish social media streams. In Proceedings of the 5th International Workshop on Health Text Mining and Information Analysis (Louhi), pages 106–115, Gothenburg, Sweden. Association for Computational Linguistics.
- Not another negation benchmark: The NaN-NLI test suite for sub-clausal negation. In Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 883–894, Online only. Association for Computational Linguistics.
- Superglue: A stickier benchmark for general-purpose language understanding systems. In Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc.
- Overview of the seventh social media mining for health applications (#SMM4H) shared tasks at COLING 2022. In Proceedings of The Seventh Workshop on Social Media Mining for Health Applications, Workshop & Shared Task, pages 221–241, Gyeongju, Republic of Korea. Association for Computational Linguistics.
- Overview of the Fourth Social Media Mining for Health (SMM4H) Shared Tasks at ACL 2019. In Proceedings of the Fourth Social Media Mining for Health Applications (#SMM4H) Workshop & Shared Task, pages 21–30, Florence, Italy. Association for Computational Linguistics.
- Regression bugs are in your model! measuring, reducing and analyzing regressions in NLP model updates. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 6589–6602, Online. Association for Computational Linguistics.
- A systematic approach for developing a corpus of patient reported adverse drug events: a case study for ssri and snri medications. Journal of biomedical informatics, 90:103091.
- Dorothea MacPhail (1 paper)
- David Harbecke (10 papers)
- Lisa Raithel (6 papers)
- Sebastian Möller (77 papers)