Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
143 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Know Your Audience: The benefits and pitfalls of generating plain language summaries beyond the "general" audience (2403.04979v1)

Published 8 Mar 2024 in cs.HC

Abstract: LLMs (LMs) show promise as tools for communicating science to the general public by simplifying and summarizing complex language. Because models can be prompted to generate text for a specific audience (e.g., college-educated adults), LMs might be used to create multiple versions of plain language summaries for people with different familiarities of scientific topics. However, it is not clear what the benefits and pitfalls of adaptive plain language are. When is simplifying necessary, what are the costs in doing so, and do these costs differ for readers with different background knowledge? Through three within-subjects studies in which we surface summaries for different envisioned audiences to participants of different backgrounds, we found that while simpler text led to the best reading experience for readers with little to no familiarity in a topic, high familiarity readers tended to ignore certain details in overly plain summaries (e.g., study limitations). Our work provides methods and guidance on ways of adapting plain language summaries beyond the single "general" audience.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (109)
  1. Takeshi Abekawa and Akiko Aizawa. 2016. SideNoter: Scholarly Paper Browsing System based on PDF Restructuring and Text Annotation. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: System Demonstrations, Hideo Watanabe (Ed.). The COLING 2016 Organizing Committee, Osaka, Japan, 136–140. https://aclanthology.org/C16-2029
  2. PersaLog: Personalization of News Article Content. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems (Denver, Colorado, USA) (CHI ’17). Association for Computing Machinery, New York, NY, USA, 3188–3200. https://doi.org/10.1145/3025453.3025631
  3. Do Cochrane summaries help student midwives understand the findings of Cochrane systematic reviews: the BRIEF randomised trial. Systematic Reviews 5 (2016).
  4. Large-scale Analysis of Counseling Conversations: An Application of Natural Language Processing to Mental Health. Transactions of the Association for Computational Linguistics 4 (2016), 463–476. https://doi.org/10.1162/tacl_a_00111
  5. Mining HPV Vaccine Knowledge Structures of Young Adults From Reddit Using Distributional Semantics and Pathfinder Networks. Cancer Control : Journal of the Moffitt Cancer Center 27 (2020).
  6. Comparative Usability Analysis and Parental Preferences of Three Web-Based Knowledge Translation Tools: Multimethod Study. Journal of Medical Internet Research 22 (2020).
  7. Writing Strategies for Science Communication: Data and Computational Analysis. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing and International Joint Conference on Natural Language Processing. 5327–5344.
  8. Framing effects: Choice of slogans used to advertise online experiments can boost recruitment and lead to sample biases. Proceedings of the ACM on Human-Computer Interaction 2, CSCW (2018), 1–19.
  9. Tal August and Katharina Reinecke. 2019. Pay attention, please: Formal language improves attention in volunteer and paid online experiments. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. ACM, 1–11.
  10. Generating Scientific Definitions with Controllable Complexity. In Association for Computational Linguistics.
  11. Paper Plain: Making Medical Research Papers Approachable to Healthcare Consumers with Natural Language Processing. ACM Trans. Comput.-Hum. Interact. 30, 5, Article 74 (sep 2023), 38 pages. https://doi.org/10.1145/3589955
  12. Med-EASi: Finely Annotated Dataset and Models for Controllable Simplification of Medical Texts. arXiv:2302.09155
  13. Keep it Simple: How Visual Complexity and Preferences Impact Search Efficiency on Websites. Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems (2020).
  14. Angela Collier Bliss. 2019. Adult Science-Based Learning: The Intersection of Digital, Science, and Information Literacies. Adult Learning 30, 3 (2019), 128–137.
  15. Language Models are Few-Shot Learners. ArXiv abs/2005.14165 (2020).
  16. No difference in knowledge obtained from infographic or plain language summary of a Cochrane systematic review: three randomized controlled trials. Journal of clinical epidemiology 97 (2018), 86–94.
  17. Framing the numerical findings of Cochrane plain language summaries: two randomized controlled trials. BMC Medical Research Methodology 20 (2020).
  18. Hallucinated but Factual! Inspecting the Factuality of Hallucinations in Abstractive Summarization. In Association for Computational Linguistics.
  19. Inquire Biology: A Textbook that Answers Questions. AI Mag. 34 (2013), 55–72.
  20. Rune Christensen. 2018. Cumulative Link Models for Ordinal Regression with the R Package ordinal. https://cran.r-project.org/web/packages/ordinal/vignettes/clm_article.pdf R package version 2022.11-16. Data accessed: September 1, 2021.
  21. Cochrane. 2021. New Standards for Plain Language Summaries. https://consumers.cochrane.org/PLEACS Date accessed: July 1, 2022.
  22. Pretrained Language Models for Sequential Sentence Classification. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics, Hong Kong, China, 3693–3699. https://doi.org/10.18653/v1/D19-1383
  23. Kevyn Collins-Thompson. 2014. Computational assessment of text readability: A survey of current and future research. ITL-International Journal of Applied Linguistics (2014), 97–135.
  24. Capture & Analysis of Active Reading Behaviors for Interactive Articles on the Web. Computer Graphics Forum 38 (2019).
  25. Robert Cudeck. 1996. Mixed-effects Models in the Study of Individual Differences with Repeated Measures Data. Multivariate behavioral research 31 3 (1996), 371–403.
  26. No country for old members: User lifecycle and linguistic change in online communities. In Proceedings of the 22nd International Conference on World Wide Web. 307–318.
  27. The Role of Rewording and Context Personalization in the Solving of Mathematical Word Problems. Journal of Educational Psychology 83 (1991), 61–68.
  28. Adriano Luiz de Souza Lima and Christiane Gresse von Wangenheim. 2022. Assessing the Visual Esthetics of User Interfaces: A Ten-Year Systematic Mapping. International Journal of Human–Computer Interaction 38 (2022), 144 – 164.
  29. Dina Demner-Fushman and Noémie Elhadad. 2016. Aspiring to Unintended Consequences of Natural Language Processing: A Review of Recent Developments in Clinical and Consumer-Generated Text Processing. Yearbook of medical informatics 1 (2016), 224–233.
  30. Michel C. Desmarais and R. Baker. 2012. A review of recent advances in learner and skill modeling in intelligent learning environments. User Modeling and User-Adapted Interaction 22 (2012), 9–38.
  31. Paragraph-level Simplification of Medical Texts. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, Online, 4972–4984. https://doi.org/10.18653/v1/2021.naacl-main.395
  32. Evaluating Factuality in Text Simplification. In Association for Computational Linguistics.
  33. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota, 4171–4186. https://doi.org/10.18653/v1/N19-1423
  34. HealthDoc: Customizing patient information and health education by medical condition and personal characteristics.
  35. Peter Dolog and Wolfgang Nejdl. 2003. Personalisation in Elena: How to cope with personalisation in distributed eLearning Networks. In Proceedings of International Conference on Worldwide Coherent Workforce, Satisfied Users-New Services For Scientific Information.
  36. Health System Decision Makers’ Feedback on Summaries and Tools Supporting the Use of Systematic Reviews: A Qualitative Study. Evidence & Policy: A Journal of Research, Debate and Practice 10 (2014), 337–359.
  37. Summary-Source Proposition-level Alignment: Task, Datasets and Supervised Baseline. In Proceedings of the 25th Conference on Computational Natural Language Learning. Association for Computational Linguistics, Online, 310–322. https://doi.org/10.18653/v1/2021.conll-1.25
  38. The Effects of Culturally Congruent Educational Technologies on Student Achievement. In Artificial Intelligence in Education.
  39. Elena Forzani. 2016. Individual Differences in Evaluating the Credibility of Online Information in Science: Contributions of Prior Knowledge, Gender, Socioeconomic Status, and Offline Reading Ability. Ph. D. Dissertation. University of Connecticut.
  40. Discourse Understanding and Factual Consistency in Abstractive Summarization. In European Chapter of the Association for Computational Linguistics.
  41. Tanya Goyal and Greg Durrett. 2021. Annotating and Modeling Fine-grained Factuality in Summarization. ArXiv abs/2104.04302 (2021).
  42. APPLS: A Meta-evaluation Testbed for Plain Language Summarization. ArXiv abs/2305.14341 (2023).
  43. Retrieval augmentation of large language models for lay language generation. Journal of Biomedical Informatics, (2023).
  44. Automated Lay Language Summarization of Biomedical Scientific Reviews. Proceedings of the AAAI Conference on Artificial Intelligence.
  45. Choochart Haruechaiyasak and Chaianun Damrongrat. 2008. Article Recommendation Based on a Topic Model for Wikipedia Selection for Schools. In International Conference on Asia-Pacific Digital Libraries.
  46. Website Morphing. Marketing Science 28, 2 (mar 2009), 202–223.
  47. Augmenting Scientific Papers with Just-in-Time, Position-Sensitive Definitions of Terms and Symbols. Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems (2021).
  48. An Evaluation of Semantically Grouped Word Cloud Designs. IEEE Transactions on Visualization and Computer Graphics 26, 9 (2020), 2748–2761.
  49. Sture Holm. 1979. A Simple Sequentially Rejective Multiple Test Procedure. Scandinavian Journal of Statistics 6 (1979), 65–70.
  50. Content Driven Enrichment of Formal Text using Concept Definitions and Applications. Proceedings of the 29th on Hypertext and Social Media (2018).
  51. Languages for different health information readers: multitrait-multimethod content analysis of Cochrane systematic reviews textual summary formats. BMC Medical Research Methodology 19 (2019).
  52. Personalisation within bounds: A risk taxonomy and policy framework for the alignment of large language models with personalised feedback. arXiv preprint arXiv:2303.05453 (2023).
  53. E-Learning personalization based on hybrid recommendation strategy and learning style identification. Compututer Education 56 (2011), 885–899.
  54. Iuliia Kotseruba and John K. Tsotsos. 2018. 40 years of cognitive architectures: core cognitive abilities and practical applications. Artificial Intelligence Review 53 (2018), 17–94.
  55. LongEval: Guidelines for Human Evaluation of Faithfulness in Long-form Summarization. In European Chapter of the Association for Computational Linguistics.
  56. SummaC: Re-Visiting NLI-based Models for Inconsistency Detection in Summarization. Transactions of the Association for Computational Linguistics 10 (2022), 163–177. https://doi.org/10.1162/tacl_a_00453
  57. SWiPE: A Dataset for Document-Level Simplification of Wikipedia Pages. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Toronto, Canada, 10674–10695. https://aclanthology.org/2023.acl-long.596
  58. The influence of text characteristics on perceived and actual difficulty of health information. International journal of medical informatics (2010), 438–449.
  59. Evaluating online health information: Beyond readability formulas. In American Medical Informatics Association Annual Symposium Proceedings. American Medical Informatics Association, 394.
  60. Chin-Yew Lin. 2004. ROUGE: A Package for Automatic Evaluation of Summaries. In Text Summarization Branches Out. Association for Computational Linguistics, Barcelona, Spain, 74–81. https://aclanthology.org/W04-1013
  61. An exploration of relations between visual appeal, trustworthiness and perceived usability of homepages. ACM Transactions on Computer-Human Interaction (TOCHI) 18 (2011), 1:1–1:30.
  62. Magnus Lindstrom and Douglas M. Bates. 1990. Nonlinear mixed effects models for repeated measures data. Biometrics 46 3 (1990), 673–87.
  63. The Semantic Reader Project: Augmenting Scholarly Documents through AI-Powered Interactive Reading Interfaces. Commun. ACM (2023).
  64. Patrice Lopez. 2009. GROBID: Combining Automatic Bibliographic Data Recognition and Term Extraction for Scholarship Publications. In European Conference on Digital Libraries.
  65. NeuroLogic Decoding: (Un)supervised Neural Text Generation with Predicate Logic Constraints. In North American Chapter of the Association for Computational Linguistics.
  66. Authoring and Generation of Individualized Patient Education Materials. American Medical Informatics Association Annual Symposium proceedings (2006), 195–9.
  67. On Faithfulness and Factuality in Abstractive Summarization. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online, 1906–1919. https://doi.org/10.18653/v1/2020.acl-main.173
  68. Morten Moshagen and Meinald T. Thielsch. 2010. Facets of visual aesthetics. International Journal of Human Computer Studies 68 (2010), 689–709.
  69. Mati Mõttus and David Jose Ribeiro Lamas. 2015. Aesthetics of Interaction Design: A Literature Review. In Machine Intelligenceand Digital Interaction ’15.
  70. Randall Munroe. 2017. Thing explainer complicated stuff in simple words. John Murray.
  71. ACCoRD: A Multi-Document Approach to Generating Diverse Descriptions of Scientific Concepts. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. Association for Computational Linguistics, Abu Dhabi, UAE, 200–213. https://doi.org/10.18653/v1/2022.emnlp-demos.20
  72. Entity-level Factual Consistency of Abstractive Text Summarization. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume. Association for Computational Linguistics, Online, 2727–2733. https://doi.org/10.18653/v1/2021.eacl-main.235
  73. ScispaCy: Fast and Robust Models for Biomedical Natural Language Processing. In Proceedings of the 18th BioNLP Workshop and Shared Task. Association for Computational Linguistics, Florence, Italy, 319–327.
  74. Matthew C Nisbet and Dietram A Scheufele. 2009. What’s next for science communication? Promising directions and lingering distractions. American journal of botany 96, 10 (2009), 1767–1778.
  75. Geoff Norman. 2010. Likert scales, levels of measurement and the “laws” of statistics. Advances in Health Sciences Education 15 (2010), 625–632.
  76. Emily Nunn and Stephen Pinfield. 2014. Lay summaries of open access journal articles: engaging with the general public on medical research. Learned Publishing 27, 3 (2014), 173–184.
  77. Proficiency and Preference Using Local Language with a Teachable Agent. In International Journal of Artificial Intelligence in Education.
  78. Understanding User Perception of Automated News Generation System. Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems (2020).
  79. OpenAI. 2023. GPT-4 Technical Report. ArXiv abs/2303.08774 (2023).
  80. Relatedly: Scaffolding Literature Reviews with Existing Related Work Sections. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (Hamburg, Germany) (CHI ’23). Association for Computing Machinery, New York, NY, USA, Article 742, 20 pages. https://doi.org/10.1145/3544548.3580841
  81. Social Simulacra: Creating Populated Prototypes for Social Computing Systems. In Proceedings of the 35th Annual ACM Symposium on User Interface Software and Technology (Bend, OR, USA) (UIST ’22). Association for Computing Machinery, New York, NY, USA, Article 74, 18 pages. https://doi.org/10.1145/3526113.3545616
  82. Emily Pitler and Ani Nenkova. 2008. Revisiting Readability: A Unified Framework for Predicting Text Quality. In Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Honolulu, Hawaii, 186–195.
  83. Alec Radford and Karthik Narasimhan. 2018. Improving Language Understanding by Generative Pre-Training.
  84. Mathieu Ranger and Karen Bultitude. 2016. ‘The kind of mildly curious sort of science interested person like me’: Science bloggers’ practices relating to audience recruitment. Public Understanding of Science 25, 3 (2016), 361–378.
  85. Artificial intelligence-assisted tools for redefining the communication landscape of the scholarly world. Journal of Science Communication.
  86. Katharina Reinecke and Krzysztof Z Gajos. 2014. Quantifying visual preferences around the world. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (2014).
  87. MOCCA - a system that learns and recommends visual preferences based on cultural similarity. In Intelligent User Interfaces.
  88. Nathan Sanders. 2013. Astrobites: Students Making Astrophysics Accessible. https://blogs.scientificamerican.com/incubator/astrobites-students-making-astrophysics-accessible/ Date Accessed: August 1, 2021.
  89. A summary to communicate evidence from systematic reviews to the public improved understanding and accessibility of information: a randomized controlled trial. Journal of clinical epidemiology 68 2 (2015), 182–90.
  90. The seduction of easiness: How science depictions influence laypeople’s reliance on their own evaluation of scientific information. Learning and Instruction 22 (2012), 231–243.
  91. Answers Unite! Unsupervised Metrics for Reinforced Summarization Models. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics, Hong Kong, China, 3246–3256. https://doi.org/10.18653/v1/D19-1320
  92. Summarizing, Simplifying, and Synthesizing Medical Evidence using GPT-3 (with Varying Success). In Annual Meeting of the Association for Computational Linguistics.
  93. Sarah Shailes. 2017. Plain-language Summaries of Research: Something for everyone. eLife 6 (mar 2017).
  94. VILA: Improving structured content extraction from scientific PDFs using visual layout groups. Transactions of the Association for Computational Linguistics 10 (2022), 376–392.
  95. Optimizing Readability and Format of Plain Language Summaries for Medical Research Articles: Cross-sectional Survey Study. Journal of Medical Internet Research 24 (2022).
  96. Physicians’ recommendations for mammography: do tailored messages make a difference? American journal of public health 84 1 (1994), 43–9.
  97. Neha Srikanth and Junyi Jessy Li. 2020. Elaborative Simplification: Content Addition and Explanation Generation in Text Simplification. In Findings of the Association for Computational Linguistics.
  98. How language formality in security and privacy interfaces impacts intended compliance. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. 1–12.
  99. Plain language summaries: A systematic review of theory, guidelines and empirical research. PLoS ONE 17 (2022).
  100. The effects of computer-tailored smoking cessation messages in family practice settings. The Journal of family practice 39 3 (1994), 262–70.
  101. Morphing Banner Advertising. Marketing Science 33, 1 (jan 2014), 27–46.
  102. Towards Individuated Reading Experiences: Different Fonts Increase Reading Speed for Different Individuals. ACM Transactions on Computer-Human Interaction (TOCHI) 29 (2022), 1 – 56.
  103. Pre-trained Language Models in Biomedical Domain: A Systematic Survey. ACM Comput. Surv. 56, 3, Article 55 (oct 2023), 52 pages. https://doi.org/10.1145/3611651
  104. Large Language Models are Diverse Role-Players for Summarization Evaluation. In Natural Language Processing and Chinese Computing. Springer Nature Switzerland, Cham, 695–707.
  105. Elaborative Simplification as Implicit Questions Under Discussion. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Singapore, 5525–5537. https://doi.org/10.18653/v1/2023.emnlp-main.336
  106. Personalized Feedback Versus Money: The Effect on Reliability of Subjective Data in Online Experimental Platforms. In Companion of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing (Portland, Oregon, USA) (CSCW ’17 Companion). Association for Computing Machinery, New York, NY, USA, 343–346. https://doi.org/10.1145/3022198.3026339
  107. Chen-Hsiang Yu and Robert C. Miller. 2010. Enhancing web page readability for non-native readers. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (2010).
  108. BERTScore: Evaluating Text Generation with BERT. In International Conference on Learning Representations. https://openreview.net/forum?id=SkeHuCVFDr
  109. Tiancheng Zhao and Kyusong Lee. 2020. Talk to Papers: Bringing Neural Question Answering to Academic Search. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations. Association for Computational Linguistics, Online, 30–36. https://doi.org/10.18653/v1/2020.acl-demos.5
Citations (4)

Summary

  • The paper demonstrates that simplified summaries improve reading ease and understanding for audiences with low background knowledge through three controlled experiments.
  • It finds that expert readers may skip crucial content in overly simplified texts, indicating that one-size-fits-all language can hinder effective communication.
  • The study advocates for adaptive summarization strategies that balance clarity with detail, combining human oversight with machine generation to meet diverse reader needs.

Examining the Impact of Language Complexity and Reader Familiarity on Engagement with Scientific Summaries

Overview

Recent research explores how generating plain language summaries with varying levels of complexity affects readers of different backgrounds' engagement and comprehension. The paper, conducted through three distinct experiments, explores the interactions between the complexity of scientific language and the reader's familiarity with the topic. It investigates whether simpler text indeed leads to better reading experiences across the board, or if the effects vary based on the reader's background knowledge.

Methodology

The researchers employed both human-written and machine-generated summaries, ranging from high to low complexity, to represent scientific findings. The complexity levels targeted three audiences: researchers (high), college-educated adults (medium), and high school students (low). They conducted three studies: the first with expert-written summaries, the second and third with machine-generated summaries, with the third specifically aiming to maintain information content across complexities. Participant responses were collected via within-subject experiments, analyzing reading ease, understanding, interest, value, and behavior (like skipping sections or requesting original articles).

Key Findings

The studies consistently showed that lower complexity summaries were preferred by participants with little familiarity with the article's subject, enhancing their reading ease and understanding significantly. However, as the familiarity of the readers increased, this preference plateaued; more knowledgeable readers did not find simpler summaries more engaging or valuable than their complex counterparts. A notable behavior observed was that these readers were more likely to skip sections in simpler versions, especially concerning when those sections contained crucial information on a paper's limitations.

Interestingly, when the third paper concentrated on not sacrificing information content in simpler summaries, only readers with the least background knowledge continued to find these versions more accessible and understandable. This suggests a delicate balance between simplifying language and retaining comprehensive details for effective science communication.

Implications

This research highlights the nuanced role of language complexity in scientific communication, urging a move beyond a one-size-fits-all approach to plain language summarization. Tailoring language complexity to the reader's pre-existing knowledge can enhance engagement and comprehension, particularly for those less familiar with the topic. However, it's crucial to prevent oversimplification that might result in missing or disregarding significant information.

For science communicators and interface designers, these findings advocate for the generation of multiple summary versions catering to different knowledge levels. Moreover, this work underscores the importance of human oversight when using machine-generated summaries to mitigate the risk of inaccuracies or information loss.

Future Directions in AI and Science Communication

Looking ahead, the potential for adaptively generated summaries to facilitate broader public understanding of scientific research is immense. As AI and LLMs continue to evolve, so too will the strategies for effectively communicating complex scientific concepts to diverse audiences. Further exploration into personalized science communication, leveraging advanced AI capabilities while ensuring factual accuracy, promises to bridge the gap between scientific research and public discourse, making science more accessible to all.

X Twitter Logo Streamline Icon: https://streamlinehq.com