InfoLossQA: Characterizing and Recovering Information Loss in Text Simplification (2401.16475v2)
Abstract: Text simplification aims to make technical texts more accessible to laypeople but often results in deletion of information and vagueness. This work proposes InfoLossQA, a framework to characterize and recover simplification-induced information loss in form of question-and-answer (QA) pairs. Building on the theory of Question Under Discussion, the QA pairs are designed to help readers deepen their knowledge of a text. We conduct a range of experiments with this framework. First, we collect a dataset of 1,000 linguist-curated QA pairs derived from 104 LLM simplifications of scientific abstracts of medical studies. Our analyses of this data reveal that information loss occurs frequently, and that the QA pairs give a high-level overview of what information was lost. Second, we devise two methods for this task: end-to-end prompting of open-source and commercial LLMs, and a natural language inference pipeline. With a novel evaluation framework considering the correctness of QA pairs and their linguistic suitability, our expert evaluation reveals that models struggle to reliably identify information loss and applying similar standards as humans at what constitutes information loss.
- Sweta Agrawal and Marine Carpuat. 2023. Do text simplification systems preserve meaning? A human evaluation via reading comprehension. CoRR, abs/2312.10126.
- Ron Artstein and Massimo Poesio. 2008. Inter-coder agreement for computational linguistics. Computational Linguistics, 34(4):555–596.
- Paper Plain: Making medical research papers approachable to healthcare consumers with natural language processing. ACM Transactions on Computer-Human Interaction, 30(5):1–38.
- Natural language processing with Python. O’Reilly Media, Inc.
- A large annotated corpus for learning natural language inference. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 632–642.
- Controllable open-ended question generation with a new question type ontology. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (ACL-IJCNLP), pages 6424–6439.
- Generating literal and implied subquestions to fact-check complex claims. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 3495–3516.
- Decontextualization: Making sentences stand-alone. Transactions of the Association for Computational Linguistics, 9:447–461.
- DiffQG: Generating questions to summarize factual changes. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics (EACL), pages 3088–3101.
- A dataset of information-seeking questions and answers anchored in research papers. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL), pages 4599–4610.
- Towards question-answering as an automatic metric for evaluating the content quality of a summary. Transactions of the Association for Computational Linguistics, 9:774–789.
- Paragraph-level simplification of medical texts. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL), pages 4972–4984.
- Evaluating factuality in text simplification. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (ACL), pages 7331–7345.
- ERASER: A benchmark to evaluate rationalized NLP models. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL), pages 4443–4458.
- Evidence inference 2.0: More data, better models. In Proceedings of the 19th SIGBioMed Workshop on Biomedical Language Processing, pages 123–132.
- Learning to ask: Neural question generation for reading comprehension. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (ACL), pages 1342–1352.
- Question generation for question answering. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 866–874.
- FEQA: A question answering evaluation framework for faithfulness assessment in abstractive summarization. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL), pages 5055–5070.
- Qlarify: Bridging scholarly abstracts and papers with recursively expandable summaries. CoRR, abs/2310.07581.
- Preferred information sources of high school students for community colleges and universities. Community College Journal of Research & Practice, 28(10):795–803.
- Sian Gooding. 2022. On the ethical considerations of text simplification. In Ninth Workshop on Speech and Language Processing for Assistive Technologies (SLPAT-2022), pages 50–57.
- SNaC: Coherence error detection for narrative summarization. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 444–463.
- Thresh: A unified, customizable and deployable platform for fine-grained text evaluation. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP): System Demonstrations, pages 336–345.
- Cochrane plain language summaries are highly heterogeneous with low adherence to the standards. BMC medical research methodology, 16:1–4.
- Shortcomings of question answering based factuality frameworks for error localization. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics (EACL), pages 132–146.
- Andrew Kehler and Hannah Rohde. 2017. Evaluating an expectation-driven question-under-discussion model of discourse interpretation. Discourse Processes, 54(3):219–238.
- Inquisitive question generation for high level text comprehension. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 6544–6555.
- Discourse comprehension: A question answering framework to represent sentence connections. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 11752–11764.
- Natural questions: A benchmark for question answering research. Transactions of the Association for Computational Linguistics, 7:452–466.
- SummaC: Re-visiting NLI-based models for inconsistency detection in summarization. Transactions of the Association for Computational Linguistics, 10:163–177.
- SWiPE: A dataset for document-level simplification of Wikipedia pages. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (ACL), pages 10674–10695.
- Less annotating, more classifying: Addressing the data scarcity issue of supervised machine learning with deep transfer learning and bert-nli. Political Analysis, 32(1):84–100.
- Inferring which medical treatments work from reports of clinical trials. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL), pages 3705–3717.
- Junyi Jessy Li and Ani Nenkova. 2015. Fast and accurate prediction of sentence specificity. Proceedings of the AAAI Conference on Artificial Intelligence, 29(1).
- Revisiting the gold standard: Grounding summarization evaluation with robust human evaluation. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (ACL), pages 4140–4170.
- Annie Louis and Ani Nenkova. 2011. Text specificity and impact on quality of news summaries. In Proceedings of the Workshop on Monolingual Text-To-Text Generation, pages 34–42.
- Ask what’s missing and what’s useful: Improving clarification question generation using global knowledge. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL), pages 4300–4312.
- SUMMAC: a text summarization evaluation. Natural Language Engineering, 8(1):43–68.
- Collective classification for fine-grained information status. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (ACL), pages 795–804.
- Patricia L McDermott and Ronna N ten Brink. 2019. Practical guidance for evaluating calibrated trust. Proceedings of the Human Factors and Ergonomics Society Annual Meeting, 63(1):362–366.
- FOLLOWUPQG: Towards information-seeking follow-up question generation. CoRR, abs/2309.05007.
- Ani Nenkova and Rebecca Passonneau. 2004. Evaluating content selection in summarization: The pyramid method. In Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics: HLT-NAACL 2004, pages 145–152.
- A question answering framework for decontextualizing user-facing snippets from scientific documents. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 3194–3212.
- Edgar Onea. 2016. Potential questions at the semantics-pragmatics interface, volume 33. Brill.
- Sarah E. Petersen and Mari Ostendorf. 2007. Text simplification for language learners: A corpus analysis. In Proc. Speech and Language Technology in Education (SLaTE 2007), pages 69–72.
- SQuAD: 100,000+ questions for machine comprehension of text. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 2383–2392.
- Justus J Randolph. 2005. Free-marginal multirater kappa (multirater k [free]): An alternative to fleiss’ fixed-marginal multirater kappa.
- Sudha Rao and Hal Daumé III. 2018. Learning to ask good questions: Ranking clarification questions using neural expected value of perfect information. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (ACL), pages 2737–2746.
- Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence embeddings using Siamese BERT-networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3982–3992.
- Annotation guidelines for questions under discussion and information structure. Information structure in lesser-described languages: Studies in prosody and syntax, pages 403–443.
- X-PARADE: Cross-lingual textual entailment and information divergence across paragraphs. CoRR, abs/2309.08873.
- David L Sackett. 1998. Evidence-based medicine. Spine, 23(10):1085–1086.
- Stretching sentence-pair NLI models to reason over long documents and clusters. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 394–412.
- Thomas Scialom and Jacopo Staiano. 2020. Ask to learn: A study on curiosity-driven question generation. In Proceedings of the 28th International Conference on Computational Linguistics (COLING), pages 2224–2235.
- Summarizing, simplifying, and synthesizing medical evidence using GPT-3 (with varying success). In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (ACL), pages 1387–1407.
- Logical reasoning for natural language inference using generated facts as atoms. CoRR, abs/2305.13214.
- Logical reasoning with span-level predictions for interpretable and robust NLI models. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 3809–3823.
- UNSDG. 2021. Access to information is the cure of disinformation. unsdg.un.org/latest/blog/access-information-cure-disinformation. Accessed: 2024-01-19.
- Kees van Deemter. 2012. Not Exactly: In Praise of Vagueness. Oxford University Press.
- Asking and answering questions to evaluate the factual consistency of summaries. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL), pages 5008–5020.
- Chain-of-thought prompting elicits reasoning in large language models. In Advances in Neural Information Processing Systems (NeurIPS), volume 35, pages 24824–24837.
- Modeling information change in science communication with semantically matched paraphrases. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1783–1807.
- QUDeval: The evaluation of questions under discussion discourse parsing. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 5344–5363.
- Problems in current text simplification research: New data can help. Transactions of the Association for Computational Linguistics, 3:283–297.
- Gauging the gap between human and machine text simplification through analytical evaluation of simplification strategies and errors. In Findings of the Association for Computational Linguistics: EACL 2023, pages 359–375.
- DocNLI: A large-scale dataset for document-level natural language inference. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pages 4913–4922.
- Discourse level factors for sentence deletion in text simplification. Proceedings of the AAAI Conference on Artificial Intelligence, 34(5):9709–9716.
- Neural question generation from text: A preliminary study. In Natural Language Processing and Chinese Computing, pages 662–671.
- Alesia Zuccala. 2010. Open access and civic scientific information literacy. Information Research: An International Electronic Journal, 15(1).
Sponsor
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.