Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Benchmarking Automated Clinical Language Simplification: Dataset, Algorithm, and Evaluation (2012.02420v2)

Published 4 Dec 2020 in cs.CL and cs.LG

Abstract: Patients with low health literacy usually have difficulty understanding medical jargon and the complex structure of professional medical language. Although some studies are proposed to automatically translate expert language into layperson-understandable language, only a few of them focus on both accuracy and readability aspects simultaneously in the clinical domain. Thus, simplification of the clinical language is still a challenging task, but unfortunately, it is not yet fully addressed in previous work. To benchmark this task, we construct a new dataset named MedLane to support the development and evaluation of automated clinical language simplification approaches. Besides, we propose a new model called DECLARE that follows the human annotation procedure and achieves state-of-the-art performance compared with eight strong baselines. To fairly evaluate the performance, we also propose three specific evaluation metrics. Experimental results demonstrate the utility of the annotated MedLane dataset and the effectiveness of the proposed model DECLARE.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (63)
  1. Neural machine translation by jointly learning to align and translate. In ICLR 2015 : International Conference on Learning Representations 2015.
  2. Evaluating neural text simplification in the medical domain. In The World Wide Web Conference, pages 3286–3292.
  3. Natural language processing with Python: analyzing text with the natural language toolkit. " O’Reilly Media, Inc.".
  4. Expertise style transfer: A new task towards better communication between experts and laymen. In Proceedings of ACL, pages 1061–1071.
  5. A natural language processing system that links medical terms in electronic health record notes to lay definitions: system development using physician reviews. Journal of medical Internet research, 20(1):e26.
  6. Ranking medical terms to support expansion of lay language resources for patient comprehension of electronic health record notes: adapted distant supervision approach. JMIR medical informatics, 5(4):e8531.
  7. Louise Deléger and Pierre Zweigenbaum. 2008. Paraphrase acquisition from comparable medical corpora of specialized and lay texts. In AMIA Annual Symposium Proceedings, volume 2008, page 146.
  8. Paragraph-level simplification of medical texts. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 4972–4984.
  9. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186.
  10. Editnts: An neural programmer-interpreter model for sentence simplification through explicit editing. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 3393–3402.
  11. Dorland. 2016. Dorland’s Dictionary of medical acronyms & abbreviations, Seventh Edition. Elsevier, Inc.
  12. The share schema for the syntactic and semantic annotation of clinical texts. Under Review.
  13. Noemie Elhadad and Komal Sutaria. 2007. Mining a lexicon of technical terms and lay equivalents. In Biological, translational, and clinical language processing, pages 49–56.
  14. Two biomedical sublanguages: a description based on the theories of zellig harris. Journal of biomedical informatics, 35(4):222–235.
  15. Physiobank, physiotoolkit, and physionet: components of a new research resource for complex physiologic signals. circulation, 101(23):e215–e220.
  16. Domain-specific language model pretraining for biomedical natural language processing. arXiv preprint arXiv:2007.15779.
  17. Automated lay language summarization of biomedical scientific reviews. In Proceedings of AAAI.
  18. Jennifer Fong Ha and Nancy Longnecker. 2010. Doctor-patient communication: a review. Ochsner Journal, 10(1):38–43.
  19. The 2019 national natural language processing (nlp) clinical challenges (n2c2)/open health nlp (ohnlp) shared task on clinical concept normalization for clinical records. Journal of the American Medical Informatics Association, 27(10):1529–1537.
  20. Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation, 9(8):1735–1780.
  21. Mimic-iii, a freely accessible critical care database. Scientific data, 3(1):1–9.
  22. A semantic and syntactic text simplification tool for health content. In AMIA annual symposium proceedings, volume 2010, page 366.
  23. A comparison of word embeddings for english and cross-lingual chinese word sense disambiguation.
  24. Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
  25. Gaps in doctor-patient communication: I. doctor-patient interaction and patient satisfaction. Pediatrics, 42(5):855–871.
  26. Keep it simple: Unsupervised simplification of multi-paragraph text. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 6365–6378.
  27. Improving electronic health record note comprehension with noteaid: randomized trial of electronic health record note comprehension interventions with crowdsourced workers. Journal of medical Internet research, 21(1):e10793.
  28. Alon Lavie and Abhaya Agarwal. 2007. Meteor: An automatic metric for mt evaluation with high levels of correlation with human judgments. In Proceedings of the second workshop on statistical machine translation, pages 228–231.
  29. Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. arXiv preprint arXiv:1910.13461.
  30. Pharmmt: A neural machine translation approach to simplify prescription directions. In Proceedings of EMNLP: Findings, pages 2785–2796.
  31. Chin-Yew Lin. 2004. Rouge: A package for automatic evaluation of summaries. In Text summarization branches out, pages 74–81.
  32. A deep database of medical abbreviations and acronyms for natural language processing. Scientific Data, 8(1):1–9.
  33. Controllable text simplification with explicit paraphrasing. arXiv preprint arXiv:2010.11004.
  34. Muss: multilingual unsupervised sentence simplification by mining paraphrases. arXiv preprint arXiv:2005.00352.
  35. Controllable sentence simplification. arXiv preprint arXiv:1910.02677.
  36. Glenda M McClure. 1987. Readability formulas: Useful or useless? IEEE transactions on professional communication, (1):12–15.
  37. Gustavo Paetzold and Lucia Specia. 2016. Unsupervised lexical simplification for non-native speakers. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 30.
  38. Gustavo Paetzold and Lucia Specia. 2017. Lexical simplification with neural ranking. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers, pages 34–40.
  39. Bleu: a method for automatic evaluation of machine translation. In Proceedings of ACL, pages 311–318.
  40. Health literacy-report of the council on scientific affairs. Jama-Journal of the American Medical Association, 281(6):552–557.
  41. Leveraging social media for medical text simplification. In Proceedings of SIGIR, pages 851–860.
  42. Improving patients’ electronic health record comprehension with noteaid. In MEDINFO 2013, pages 714–718. IOS Press.
  43. Text simplification using consumer health vocabulary to generate patient-centered radiology reporting: translation and evaluation. Journal of medical Internet research, 19(12):e417.
  44. Exploring the limits of transfer learning with a unified text-to-text transformer. arXiv preprint arXiv:1910.10683.
  45. Wikiumls: Aligning umls to wikipedia via cross-lingual neural ranking. arXiv preprint arXiv:2005.01281.
  46. Context-aware automatic text simplification of health materials in low-resource domains. In Proceedings of the 11th International Workshop on Health Text Mining and Information Analysis, pages 115–126.
  47. Felix Stahlberg and Shankar Kumar. 2020. Seq2edits: Sequence transduction using span-level edit operations. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 5147–5159.
  48. Limited literacy and mortality in the elderly. Journal of general internal medicine, 21(8):806–812.
  49. Sequence to sequence learning with neural networks. Advances in Neural Information Processing Systems, 27:3104–3112.
  50. Automets: The autocomplete for medical text simplification. arXiv preprint arXiv:2010.10573.
  51. Attention is all you need. In Advances in neural information processing systems, pages 5998–6008.
  52. Cider: Consensus-based image description evaluation. In Proceedings of CVPR, pages 4566–4575.
  53. Pointer networks. In C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett, editors, Advances in Neural Information Processing Systems 28, pages 2692–2700.
  54. Mining consumer health vocabulary from community-generated text. AMIA, 2014:1150.
  55. Neural text generation with unlikelihood training. arXiv preprint arXiv:1908.04319.
  56. Unsupervised clinical language translation. In Proceedings of SIGKDD, pages 3121–3131.
  57. Wei-Hung Weng and Peter Szolovits. 2018. Mapping unparalleled clinical professional and consumer languages with embedding alignment.
  58. Optimizing statistical machine translation for text simplification. Transactions of the Association for Computational Linguistics, 4:401–415.
  59. Qing T Zeng and Tony Tse. 2006. Exploring and developing consumer health vocabularies. Journal of the American Medical Informatics Association, 13(1):24–29.
  60. Making texts in electronic health records comprehensible to consumers: a prototype translator. In AMIA Annual Symposium Proceedings, volume 2007, page 846.
  61. Dcmn+: Dual co-matching network for multi-choice reading comprehension. In Proceedings of AAAI, volume 34, pages 9563–9570.
  62. Work effort, readability and quality of pharmacy transcription of patient directions from electronic prescriptions: a retrospective observational cohort analysis. BMJ quality & safety, 30(4):311–319.
  63. From intrinsic to counterfactual: On the explainability of contextualized recommender systems. CoRR, abs/2110.14844.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Junyu Luo (30 papers)
  2. Zifei Zheng (1 paper)
  3. Hanzhong Ye (1 paper)
  4. Muchao Ye (11 papers)
  5. Yaqing Wang (59 papers)
  6. Quanzeng You (41 papers)
  7. Cao Xiao (84 papers)
  8. Fenglong Ma (66 papers)
Citations (5)

Summary

We haven't generated a summary for this paper yet.