Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
194 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Cross-lingual Argument Mining in the Medical Domain (2301.10527v3)

Published 25 Jan 2023 in cs.CL

Abstract: Nowadays the medical domain is receiving more and more attention in applications involving Artificial Intelligence as clinicians decision-making is increasingly dependent on dealing with enormous amounts of unstructured textual data. In this context, Argument Mining (AM) helps to meaningfully structure textual data by identifying the argumentative components in the text and classifying the relations between them. However, as it is the case for man tasks in Natural Language Processing in general and in medical text processing in particular, the large majority of the work on computational argumentation has been focusing only on the English language. In this paper, we investigate several strategies to perform AM in medical texts for a language such as Spanish, for which no annotated data is available. Our work shows that automatically translating and projecting annotations (data-transfer) from English to a given target language is an effective way to generate annotated data without costly manual intervention. Furthermore, and contrary to conclusions from previous work for other sequence labelling tasks, our experiments demonstrate that data-transfer outperforms methods based on the crosslingual transfer capabilities of multilingual pre-trained LLMs (model-transfer). Finally, we show how the automatically generated data in Spanish can also be used to improve results in the original English monolingual setting, providing thus a fully automatic data augmentation strategy.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (36)
  1. 2021. Argumentation mining in scientific literature: From computational linguistics to biomedicine. In Frommholz I, Mayr P, Cabanac G, Verberne S, editors. BIR 2021: 11th International Workshop on Bibliometric-enhanced Information Retrieval; 2021 Apr 1; Lucca, Italy. Aachen: CEUR; 2021. p. 20-36, pages 20–36. CEUR Workshop Proceedings.
  2. Agerri, R. and E. Agirre. 2023. Lessons learned from the evaluation of Spanish Language Models. Proces. del Leng. Natural, 70:157–170.
  3. 2018. Building named entity recognition taggers via parallel corpora. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018).
  4. Alamri, A. and M. Stevenson. 2016. A corpus of potentially contradictory research claims from cardiovascular research abstracts. Journal of biomedical semantics, 7(1):1–9.
  5. 2020. On the cross-lingual transferability of monolingual representations. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 4623–4637. Association for Computational Linguistics.
  6. 2019. Scibert: A pretrained language model for scientific text. arXiv preprint arXiv:1903.10676.
  7. 2019. Multi-source cross-lingual model transfer: Learning what to share. In A. Korhonen, D. Traum, and L. Màrquez, editors, Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 3098–3112. Association for Computational Linguistics.
  8. 2020. Unsupervised Cross-lingual Representation Learning at Scale. In ACL.
  9. 2019. Bert: Pre-training of deep bidirectional transformers for language understanding. ArXiv, abs/1810.04805.
  10. Dou, Z.-Y. and G. Neubig. 2021. Word alignment by fine-tuning embeddings on parallel corpora. arXiv preprint arXiv:2101.08231.
  11. 2018. Cross-lingual argumentation mining: Machine translation (and a bit of projection) is all you need! In Proceedings of the 27th International Conference on Computational Linguistics, pages 831–844. Association for Computational Linguistics.
  12. 2021. Beyond english-centric multilingual machine translation. Journal of Machine Learning Research, 22(107):1–48.
  13. 2016. Ten pairs to tag-multilingual pos tagging via coarse mapping between embeddings. Association for Computational Linguistics.
  14. 2022. Model and data transfer for cross-lingual sequence labelling in zero-resource settings. In In Findings of EMNLP.
  15. 2014. Argumentation for scientific claims in a biomedical research article. In ArgNLP, pages 21–25.
  16. 2019. Biobert: A pre-trained biomedical language representation model for biomedical text mining. bioinformatics, btz682.
  17. 2020. MLQA: Evaluating cross-lingual extractive question answering. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 7315–7330. Association for Computational Linguistics.
  18. 2017. Crowdsourcing argumentation structures in chinese hotel reviews. In 2017 IEEE International Conference on Systems, Man, and Cybernetics (SMC), pages 87–92. IEEE.
  19. 2020. On the importance of word order information in cross-lingual sequence labeling. In AAAI Conference on Artificial Intelligence.
  20. 2021. Enhancing evidence-based medicine with natural language argumentative analysis of clinical trials. Artificial Intelligence in Medicine, 118:102098.
  21. Mochales, R. and A. Ieven. 2009. Creating an argumentation corpus: do theories apply to real arguments? a case study on the legal argumentation of the echr. In Proceedings of the 12th international conference on artificial intelligence and law, pages 21–30.
  22. Peldszus, A. and M. Stede. 2013. From argument diagrams to argumentation mining in texts: A survey. International Journal of Cognitive Informatics and Natural Intelligence (IJCINI), 7(1):1–31.
  23. 2019. How Multilingual is Multilingual BERT? In ACL.
  24. 2020. Simalign: High quality word alignments without parallel training data using static and contextualized embeddings. arXiv preprint arXiv:2004.08728.
  25. 1996. Evidence based medicine: what it is and what it isn’t. BMJ, 312:71 – 72.
  26. 2006. Medical arguments in an automated health care system. In AAAI Spring Symposium: Argumentation for Consumers of Healthcare, pages 96–104.
  27. 2021. Cross-lingual annotation projection for argument mining in portuguese. In Portuguese Conference on Artificial Intelligence.
  28. Stab, C. and I. Gurevych. 2014. Annotating argument components and relations in persuasive essays. In Proceedings of COLING 2014, the 25th international conference on computational linguistics: Technical papers, pages 1501–1510.
  29. Stab, C. and I. Gurevych. 2017. Parsing argumentation structures in persuasive essays. Computational Linguistics, 43(3):619–659, September.
  30. 2020. Multilingual translation with extensible multilingual pretraining and finetuning. arXiv preprint arXiv:2008.00401.
  31. 2020. Opus-mt–building open translation services for the world. In Proceedings of the 22nd Annual Conference of the European Association for Machine Translation. European Association for Machine Translation.
  32. Toulmin, S. E. 1958. The uses of argument. Cambridge university press.
  33. 2017. Attention is all you need. In Advances in neural information processing systems, pages 5998–6008.
  34. Wu, S. and M. Dredze. 2020. Are All Languages Created Equal in Multilingual BERT? In Workshop on Representation Learning for NLP.
  35. 2017. Transfer learning for sequence tagging with hierarchical recurrent networks. arXiv preprint arXiv:1703.06345.
  36. 2001. Inducing multilingual text analysis tools via robust projection across aligned corpora. In Proceedings of the First International Conference on Human Language Technology Research, pages 1–8.
Citations (7)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets