Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Multilingual Simplification of Medical Texts (2305.12532v4)

Published 21 May 2023 in cs.CL

Abstract: Automated text simplification aims to produce simple versions of complex texts. This task is especially useful in the medical domain, where the latest medical findings are typically communicated via complex and technical articles. This creates barriers for laypeople seeking access to up-to-date medical findings, consequently impeding progress on health literacy. Most existing work on medical text simplification has focused on monolingual settings, with the result that such evidence would be available only in just one language (most often, English). This work addresses this limitation via multilingual simplification, i.e., directly simplifying complex texts into simplified texts in multiple languages. We introduce MultiCochrane, the first sentence-aligned multilingual text simplification dataset for the medical domain in four languages: English, Spanish, French, and Farsi. We evaluate fine-tuned and zero-shot models across these languages, with extensive human assessments and analyses. Although models can now generate viable simplified texts, we identify outstanding challenges that this dataset might be used to address.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (50)
  1. Medical text simplification using synonym replacement: Adapting assessment of word difficulty to a compounding language. In Proceedings of the 3rd Workshop on Predicting and Improving Text Readability for Target Reader Populations (PITR), pages 57–65.
  2. Sweta Agrawal and Marine Carpuat. 2019. Controlling text complexity in neural machine translation. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 1549–1564, Hong Kong, China. Association for Computational Linguistics.
  3. Design, development and validation of a system for automatic help to medical text understanding. International journal of medical informatics, 138:104109.
  4. Data-driven sentence simplification: Survey and benchmark. Computational Linguistics, 46(1):135–187.
  5. Mikel Artetxe and Holger Schwenk. 2019. Massively multilingual sentence embeddings for zero-shot cross-lingual transfer and beyond. Transactions of the Association for Computational Linguistics, 7:597–610.
  6. Paper plain: Making medical research papers approachable to healthcare consumers with natural language processing.
  7. Daniel Bakkelund. 2009. An lcs-based string metric. In An LCS-based string metric.
  8. Med-easi: Finely annotated dataset and models for controllable simplification of medical texts. In AAAI Conference on Artificial Intelligence.
  9. Low health literacy and health outcomes: an updated systematic review. Annals of internal medicine, 155(2):97–107.
  10. Expertise style transfer: A new task towards better communication between experts and laymen. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 1061–1071, Online. Association for Computational Linguistics.
  11. Rémi Cardon and Natalia Grabar. 2020. French biomedical text simplification: When small and precise helps. In Proceedings of the 28th International Conference on Computational Linguistics, pages 710–716.
  12. Scaling instruction-finetuned language models. arXiv preprint arXiv:2210.11416.
  13. William Coster and David Kauchak. 2011. Simple English Wikipedia: A new text simplification task. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pages 665–669, Portland, Oregon, USA. Association for Computational Linguistics.
  14. Paragraph-level simplification of medical texts. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 4972–4984, Online. Association for Computational Linguistics.
  15. Evaluating factuality in text simplification. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 7331–7345, Dublin, Ireland. Association for Computational Linguistics.
  16. Natalia Grabar and Rémi Cardon. 2018. Clear–simple corpus for medical french. In Proceedings of the 1st Workshop on Automatic Text Adaptation (ATA), pages 3–9.
  17. Cells: A parallel corpus for biomedical lay language generation. arXiv preprint arXiv:2211.03818.
  18. Automated lay language summarization of biomedical scientific reviews. ArXiv, abs/2012.12573.
  19. Neural CRF model for sentence alignment in text simplification. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 7943–7960, Online. Association for Computational Linguistics.
  20. Health literacy: the solid facts. World Health Organization. Regional Office for Europe.
  21. BiSECT: Learning to split and rephrase sentences with bitexts. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 6193–6209, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
  22. Improving consumer understanding of medical text: Development and validation of a new subsimplify algorithm to automatically generate term explanations in english and spanish. Journal of medical Internet research, 20(8):e10779.
  23. Neural semi-Markov CRF for monolingual word alignment. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 6815–6828, Online. Association for Computational Linguistics.
  24. What is the meaning of health literacy? a systematic review and qualitative synthesis. Family medicine and community health, 8(2).
  25. MUSS: Multilingual unsupervised sentence simplification by mining paraphrases. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 1651–1664, Marseille, France. European Language Resources Association.
  26. MASSAlign: Alignment and annotation of comparable documents. In Proceedings of the IJCNLP 2017, System Demonstrations, pages 1–4, Tapei, Taiwan. Association for Computational Linguistics.
  27. Medical text simplification using reinforcement learning (teslea): Deep learning–based text simplification approach. JMIR Medical Informatics, 10(11):e38095.
  28. Matt Post. 2018. A call for clarity in reporting BLEU scores. In Proceedings of the Third Conference on Machine Translation: Research Papers, pages 186–191, Brussels, Belgium. Association for Computational Linguistics.
  29. Justus J Randolph. 2005. Free-marginal multirater kappa (multirater k [free]): An alternative to fleiss’ fixed-marginal multirater kappa. Online submission.
  30. Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence embeddings using Siamese BERT-networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3982–3992, Hong Kong, China. Association for Computational Linguistics.
  31. Juliane Ried. 2023. About translation at cochrane. https://documentation.cochrane.org/display/TH/About+translation+at+Cochrane.
  32. XTREME-R: Towards more challenging and nuanced multilingual evaluation. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 10215–10245, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
  33. Revisiting non-English text simplification: A unified multilingual benchmark. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 4898–4927, Toronto, Canada. Association for Computational Linguistics.
  34. MLSUM: The multilingual summarization corpus. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 8051–8067, Online. Association for Computational Linguistics.
  35. Summarizing, simplifying, and synthesizing medical evidence using GPT-3 (with varying success). In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 1387–1407, Toronto, Canada. Association for Computational Linguistics.
  36. Advaith Siddharthan. 2014. A survey of research on text simplification. ITL-International Journal of Applied Linguistics, 165(2):259–298.
  37. Neha Srikanth and Junyi Jessy Li. 2021. Elaborative simplification: Content addition and explanation generation in text simplification. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pages 5123–5137, Online. Association for Computational Linguistics.
  38. Teerapaun Tanprasert and David Kauchak. 2021. Flesch-kincaid is not a text simplification evaluation metric. In Proceedings of the 1st Workshop on Natural Language Generation, Evaluation, and Metrics (GEM 2021), pages 1–14, Online. Association for Computational Linguistics.
  39. Brian Thompson and Philipp Koehn. 2019. Vecalign: Improved sentence alignment in linear time and space. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 1342–1348, Hong Kong, China. Association for Computational Linguistics.
  40. Patient-friendly clinical notes: Towards a new text simplification dataset. In Proceedings of the Workshop on Text Simplification, Accessibility, and Readability (TSAR-2022), pages 19–27, Abu Dhabi, United Arab Emirates (Virtual). Association for Computational Linguistics.
  41. AutoMeTS: The autocomplete for medical text simplification. In Proceedings of the 28th International Conference on Computational Linguistics, pages 1424–1434, Barcelona, Spain (Online). International Committee on Computational Linguistics.
  42. Evaluating neural text simplification in the medical domain. In The World Wide Web Conference, pages 3286–3292.
  43. Overcoming catastrophic forgetting in zero-shot cross-lingual generation. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 9279–9300, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
  44. Kristian Woodsend and Mirella Lapata. 2011. Learning to simplify sentences with quasi-synchronous grammar and integer programming. In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, pages 409–420, Edinburgh, Scotland, UK. Association for Computational Linguistics.
  45. Problems in current text simplification research: New data can help. Transactions of the Association for Computational Linguistics, 3:283–297.
  46. Optimizing statistical machine translation for text simplification. Transactions of the Association for Computational Linguistics, 4:401–415.
  47. mT5: A massively multilingual pre-trained text-to-text transformer. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 483–498, Online. Association for Computational Linguistics.
  48. BERTScore: Evaluating text generation with BERT. In International Conference on Learning Representations.
  49. Xingxing Zhang and Mirella Lapata. 2017. Sentence simplification with deep reinforcement learning. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 584–594, Copenhagen, Denmark. Association for Computational Linguistics.
  50. A monolingual tree-based translation model for sentence simplification. In Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010), pages 1353–1361, Beijing, China. Coling 2010 Organizing Committee.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Sebastian Joseph (4 papers)
  2. Kathryn Kazanas (1 paper)
  3. Keziah Reina (1 paper)
  4. Vishnesh J. Ramanathan (1 paper)
  5. Wei Xu (537 papers)
  6. Byron C. Wallace (82 papers)
  7. Junyi Jessy Li (79 papers)
Citations (7)

Summary

We haven't generated a summary for this paper yet.