Papers
Topics
Authors
Recent
2000 character limit reached

InfoLossQA: Characterizing and Recovering Information Loss in Text Simplification (2401.16475v2)

Published 29 Jan 2024 in cs.CL

Abstract: Text simplification aims to make technical texts more accessible to laypeople but often results in deletion of information and vagueness. This work proposes InfoLossQA, a framework to characterize and recover simplification-induced information loss in form of question-and-answer (QA) pairs. Building on the theory of Question Under Discussion, the QA pairs are designed to help readers deepen their knowledge of a text. We conduct a range of experiments with this framework. First, we collect a dataset of 1,000 linguist-curated QA pairs derived from 104 LLM simplifications of scientific abstracts of medical studies. Our analyses of this data reveal that information loss occurs frequently, and that the QA pairs give a high-level overview of what information was lost. Second, we devise two methods for this task: end-to-end prompting of open-source and commercial LLMs, and a natural language inference pipeline. With a novel evaluation framework considering the correctness of QA pairs and their linguistic suitability, our expert evaluation reveals that models struggle to reliably identify information loss and applying similar standards as humans at what constitutes information loss.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (69)
  1. Sweta Agrawal and Marine Carpuat. 2023. Do text simplification systems preserve meaning? A human evaluation via reading comprehension. CoRR, abs/2312.10126.
  2. Ron Artstein and Massimo Poesio. 2008. Inter-coder agreement for computational linguistics. Computational Linguistics, 34(4):555–596.
  3. Paper Plain: Making medical research papers approachable to healthcare consumers with natural language processing. ACM Transactions on Computer-Human Interaction, 30(5):1–38.
  4. Natural language processing with Python. O’Reilly Media, Inc.
  5. A large annotated corpus for learning natural language inference. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 632–642.
  6. Controllable open-ended question generation with a new question type ontology. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (ACL-IJCNLP), pages 6424–6439.
  7. Generating literal and implied subquestions to fact-check complex claims. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 3495–3516.
  8. Decontextualization: Making sentences stand-alone. Transactions of the Association for Computational Linguistics, 9:447–461.
  9. DiffQG: Generating questions to summarize factual changes. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics (EACL), pages 3088–3101.
  10. A dataset of information-seeking questions and answers anchored in research papers. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL), pages 4599–4610.
  11. Towards question-answering as an automatic metric for evaluating the content quality of a summary. Transactions of the Association for Computational Linguistics, 9:774–789.
  12. Paragraph-level simplification of medical texts. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL), pages 4972–4984.
  13. Evaluating factuality in text simplification. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (ACL), pages 7331–7345.
  14. ERASER: A benchmark to evaluate rationalized NLP models. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL), pages 4443–4458.
  15. Evidence inference 2.0: More data, better models. In Proceedings of the 19th SIGBioMed Workshop on Biomedical Language Processing, pages 123–132.
  16. Learning to ask: Neural question generation for reading comprehension. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (ACL), pages 1342–1352.
  17. Question generation for question answering. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 866–874.
  18. FEQA: A question answering evaluation framework for faithfulness assessment in abstractive summarization. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL), pages 5055–5070.
  19. Qlarify: Bridging scholarly abstracts and papers with recursively expandable summaries. CoRR, abs/2310.07581.
  20. Preferred information sources of high school students for community colleges and universities. Community College Journal of Research & Practice, 28(10):795–803.
  21. Sian Gooding. 2022. On the ethical considerations of text simplification. In Ninth Workshop on Speech and Language Processing for Assistive Technologies (SLPAT-2022), pages 50–57.
  22. SNaC: Coherence error detection for narrative summarization. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 444–463.
  23. Thresh: A unified, customizable and deployable platform for fine-grained text evaluation. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP): System Demonstrations, pages 336–345.
  24. Cochrane plain language summaries are highly heterogeneous with low adherence to the standards. BMC medical research methodology, 16:1–4.
  25. Shortcomings of question answering based factuality frameworks for error localization. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics (EACL), pages 132–146.
  26. Andrew Kehler and Hannah Rohde. 2017. Evaluating an expectation-driven question-under-discussion model of discourse interpretation. Discourse Processes, 54(3):219–238.
  27. Inquisitive question generation for high level text comprehension. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 6544–6555.
  28. Discourse comprehension: A question answering framework to represent sentence connections. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 11752–11764.
  29. Natural questions: A benchmark for question answering research. Transactions of the Association for Computational Linguistics, 7:452–466.
  30. SummaC: Re-visiting NLI-based models for inconsistency detection in summarization. Transactions of the Association for Computational Linguistics, 10:163–177.
  31. SWiPE: A dataset for document-level simplification of Wikipedia pages. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (ACL), pages 10674–10695.
  32. Less annotating, more classifying: Addressing the data scarcity issue of supervised machine learning with deep transfer learning and bert-nli. Political Analysis, 32(1):84–100.
  33. Inferring which medical treatments work from reports of clinical trials. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL), pages 3705–3717.
  34. Junyi Jessy Li and Ani Nenkova. 2015. Fast and accurate prediction of sentence specificity. Proceedings of the AAAI Conference on Artificial Intelligence, 29(1).
  35. Revisiting the gold standard: Grounding summarization evaluation with robust human evaluation. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (ACL), pages 4140–4170.
  36. Annie Louis and Ani Nenkova. 2011. Text specificity and impact on quality of news summaries. In Proceedings of the Workshop on Monolingual Text-To-Text Generation, pages 34–42.
  37. Ask what’s missing and what’s useful: Improving clarification question generation using global knowledge. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL), pages 4300–4312.
  38. SUMMAC: a text summarization evaluation. Natural Language Engineering, 8(1):43–68.
  39. Collective classification for fine-grained information status. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (ACL), pages 795–804.
  40. Patricia L McDermott and Ronna N ten Brink. 2019. Practical guidance for evaluating calibrated trust. Proceedings of the Human Factors and Ergonomics Society Annual Meeting, 63(1):362–366.
  41. FOLLOWUPQG: Towards information-seeking follow-up question generation. CoRR, abs/2309.05007.
  42. Ani Nenkova and Rebecca Passonneau. 2004. Evaluating content selection in summarization: The pyramid method. In Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics: HLT-NAACL 2004, pages 145–152.
  43. A question answering framework for decontextualizing user-facing snippets from scientific documents. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 3194–3212.
  44. Edgar Onea. 2016. Potential questions at the semantics-pragmatics interface, volume 33. Brill.
  45. Sarah E. Petersen and Mari Ostendorf. 2007. Text simplification for language learners: A corpus analysis. In Proc. Speech and Language Technology in Education (SLaTE 2007), pages 69–72.
  46. SQuAD: 100,000+ questions for machine comprehension of text. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 2383–2392.
  47. Justus J Randolph. 2005. Free-marginal multirater kappa (multirater k [free]): An alternative to fleiss’ fixed-marginal multirater kappa.
  48. Sudha Rao and Hal Daumé III. 2018. Learning to ask good questions: Ranking clarification questions using neural expected value of perfect information. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (ACL), pages 2737–2746.
  49. Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence embeddings using Siamese BERT-networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3982–3992.
  50. Annotation guidelines for questions under discussion and information structure. Information structure in lesser-described languages: Studies in prosody and syntax, pages 403–443.
  51. X-PARADE: Cross-lingual textual entailment and information divergence across paragraphs. CoRR, abs/2309.08873.
  52. David L Sackett. 1998. Evidence-based medicine. Spine, 23(10):1085–1086.
  53. Stretching sentence-pair NLI models to reason over long documents and clusters. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 394–412.
  54. Thomas Scialom and Jacopo Staiano. 2020. Ask to learn: A study on curiosity-driven question generation. In Proceedings of the 28th International Conference on Computational Linguistics (COLING), pages 2224–2235.
  55. Summarizing, simplifying, and synthesizing medical evidence using GPT-3 (with varying success). In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (ACL), pages 1387–1407.
  56. Logical reasoning for natural language inference using generated facts as atoms. CoRR, abs/2305.13214.
  57. Logical reasoning with span-level predictions for interpretable and robust NLI models. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 3809–3823.
  58. UNSDG. 2021. Access to information is the cure of disinformation. unsdg.un.org/latest/blog/access-information-cure-disinformation. Accessed: 2024-01-19.
  59. Kees van Deemter. 2012. Not Exactly: In Praise of Vagueness. Oxford University Press.
  60. Asking and answering questions to evaluate the factual consistency of summaries. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL), pages 5008–5020.
  61. Chain-of-thought prompting elicits reasoning in large language models. In Advances in Neural Information Processing Systems (NeurIPS), volume 35, pages 24824–24837.
  62. Modeling information change in science communication with semantically matched paraphrases. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1783–1807.
  63. QUDeval: The evaluation of questions under discussion discourse parsing. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 5344–5363.
  64. Problems in current text simplification research: New data can help. Transactions of the Association for Computational Linguistics, 3:283–297.
  65. Gauging the gap between human and machine text simplification through analytical evaluation of simplification strategies and errors. In Findings of the Association for Computational Linguistics: EACL 2023, pages 359–375.
  66. DocNLI: A large-scale dataset for document-level natural language inference. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pages 4913–4922.
  67. Discourse level factors for sentence deletion in text simplification. Proceedings of the AAAI Conference on Artificial Intelligence, 34(5):9709–9716.
  68. Neural question generation from text: A preliminary study. In Natural Language Processing and Chinese Computing, pages 662–671.
  69. Alesia Zuccala. 2010. Open access and civic scientific information literacy. Information Research: An International Electronic Journal, 15(1).
Citations (4)

Summary

  • The paper introduces a framework that uses linguist-curated QA pairs to pinpoint and address missing information in simplified texts.
  • It compares two methods—direct LLM prompting and an NLI pipeline—with the latter achieving higher accuracy in atomic fact entailment.
  • The study sets the stage for interactive AI tools and user-centric design by bridging gaps between human and machine assessments in text simplification.

Introduction

Text simplification is an important tool for enhancing accessibility of technical content to a broader audience, particularly in specialized domains such as medicine. However, simplification can inadvertently lead to information loss, creating challenges for laypeople who wish to understand complex texts in their entirety. Addressing this issue, researchers have developed InfoLossQA, a methodology to identify and compensate for information omitted or obscured due to simplification processes. This paper explores how the InfoLossQA framework, through the use of linguist-curated question-and-answer (QA) pairs, detects and mitigates the effects of information loss for lay readers.

The InfoLossQA Framework

Central to InfoLossQA is the generation of QA pairs that pinpoint exactly what information a simplified text lacks compared to its original form. Inspired by theories in pragmatics and discourse, specifically the Questions Under Discussion framework, this approach distinguishes itself by not requiring direct access to the original text, thereby allowing lay readers to derive additional details missing from the simplified content they are provided.

In the paper's dataset, 1,000 QA pairs were curated by linguists derived from simplified medical abstracts. These pairs serve as markers of lost specificity and expose omissions and vagueness in LLM simplifications. The researchers additionally introduce two methodologies to carry out this task - a direct end-to-end prompting of LLMs, and a natural language inference (NLI) pipeline bridging atomic fact entailment with localized QA generation.

Empirical Findings

Upon expert evaluation of the different models, the paper reports that while LLMs display competency in the QA format, they primarily fall short in reliably pinpointing instances of information loss. This highlights a critical gap when comparing the performance of machine intelligence against human judgment in recognizing and quantifying simplification-induced information losses. Notably, the NLI pipeline method, relying on entailment reasoning, demonstrated enhanced effectiveness in correctly identifying info loss based on atomic fact analysis compared to open-source LLMs.

Implications and Future Directions

The implications of this research are substantial. InfoLossQA serves not only as a diagnostic tool for the analysis of information loss in text simplification but also as a means to introduce rich metadata that could empower interactive AI tools aiding comprehension. The study's technical contributions, particularly the comprehensive framework evaluating models' ability to generate pertinent and readable QAs, set a benchmark for future developments in both text simplification and the broader landscape of LLM evaluation.

One of the core challenges moving forward will be bridging the perceptual gap between human and machine standards of information completeness and ensuring that the simplifications preserve the integrity of the original content without loss of critical information. It also underscores the need for iterative user-centered design where feedback mechanisms are embedded within simplification tools. This paper lays essential groundwork for future expansions across different languages, genres, and modes of simplification, advancing towards AI-driven text simplification responsible and usable by all.

Whiteboard

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 5 tweets with 65 likes about this paper.